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CHAPTER  1 
INTRODUCTION 

The  problem  of  efficient  management  and  control  of  large  scale 
systems  has  been  extremely  challenging  to  control  engineers.  There  are 
essentially  two  main  Issues  of  concern.  The  modeling  Issue  Is  complicated 
due  to  the  large  dimension  of  the  system.  The  crucial  problem  here  Is  one 
of  model  simplification,  l.e.,  how  to  obtain  a  simplified  low-order  model  of 
the  system  which  would  result  In  an  acceptable  control  design  [2,4,5].  In 
large  scale  systems  the  model  simplification  problem  Is  Intimately  related 
to  notions  of  time-scales,  weak-coupllng  and  controllability-observability 
[1-8].  The  control  design  Issue  Is  complicated  due  to  the  presence  of 
multiple  decision  makers  having  possibly  different  goals  and  possessing 
decentralized  Information.  The  crucial  problem  here  Is  to  obtain  optimal 
multicontroller  strategies  under  nonclasslcal  Information  patterns  and 
various  cooperative  and  noncooperative  solution  concepts  [9-13].  In  large 
scale  system  design  the  Intricate  relationship  between  the  modeling  and 
strategy  design  Issues  Introduces  additional  complexities  not  encountered 
while  considering  each  problem  In  Isolation.  This  Is  due  to  the  fact  that 
many  aspects  of  the  system  structure  are  variant  under  the  control  actions. 

Many  cases  of  Ill-posed  closed-loop  designs  based  on  reduced-order  models 
have  been  reported  (see  for  example  [45]).  The  complexities  get  more 
Involved  when  there  are  multiple  decision  makers  as  opposed  to  a  centralized 
decision  maker  [21-24].  This  Is  because  each  decision  maker's  perception 
of  the  system  structure  and  dynamics  may  be  altered  by  the  actions  of  the  other 
decision  makers.  Hence  ai.y  approach  towards  developing  an  efficient  design 


methodology  must  treat  the  modeling  and  strategy  design  Issues  In  a  unified 
framework. 

The  central  theme  of  this  thesis  is  multimodeling.  It  is 
concerned  with  modeling  and  control  strategy  interaction  in  a  multimodel 
context.  In  large  scale  system  design,  it  is  desirable  to  allow  the  decision 
makers  to  use  different  simplified  models  of  the  system  [63],  due  to:  i)  the 
necessity  to  ease  the  computational  burden  associated  with  simulation, 
analysis,  and  design;  li)  the  need  to  obtain  a  simplified  control  structure 
which  is  feasible  to  Implement;  and  Hi)  a  lack  of  adequately  modeled  dynamics 
of  some  parts  of  the  system.  In  this  thesis  we  study  realistic  situations, 
which  allow  the  decision  makers  to  use  different  models  of  the  system.  It  is 
our  purpose  to  strengthen  and  extend  the  multimodeling  concept  beyond  the 
framework  within  which  it  was  originally  Introduced  in  [14,15].  Towards 
this  end,  we  examine  three  different  approaches  to  multimodeling.  Firstly, 
we  consider  situations,  when  a  rational  choice  of  the  multimodeling  scheme 
is  made  a-prlorl,  based  solely  on  the  model  structure.  To  establish  the 
validity  of  such  a  scheme  we  then  examine  its  impact  on  the  design  of  control 
strategies.  Specifically,  our  two  main  Issues  of  concern  are:  the  preser¬ 
vation  of  stability;  and,  a  minimal  loss  in  performance.  Secondly,  we 
explore  multimodeling  possibilities  in  numerical  algorithms  which  compute 
near-optimal  policies.  Finally,  we  attempt  to  Induce  multimodel  solutions 
by  an  appropriate  re-structuring  of  the  problem,  and  a  suitable  choice  of 
admissible  strategies.  We  hope  our  study  would  reveal  the  Interplay  between 
the  structural  features  of  the  system  like  time-scales,  weak- coup ling, 
controllability-observability,  and  strategy  design  under  nonclasslcal  informa¬ 
tion  patterns;  and  help  us  to  achieve  a  better  understanding  of  the  multi¬ 


modeling  concept. 


The  concept  of  multimodel  strategies  for  large  scale  systems  has 
been  introduced  in  [14,15]  within  the  framework  of  multiparameter  singular 
perturbations.  In  this  framework,  a  large  scale  system  is  viewed  as 
consisting  of  a  "slow"  core  coupled  to  a  number  of  "fast"  subsystems.  A 
multimodel  situation  results  when  each  decision  maker  models  the  dynamics  of 
one  fast  subsystem  and  assumes  a  certain  reduced- order  equivalent  of  the 
rest  of  the  system.  The  design  objective  of  each  decision  maker  is  assumed 
to  be  compatible  with  the  multimodel  assumptions,  l.e.,  each  decision  maker  is 
assumed  not  to  penalize  the  neglected  fast  dynamics  in  his  objective 
functional.  In  [15,16],  an  attempt  was  made  to  interpret  this  practical  multi¬ 
model  situation  as  a  perturbation  problem  since  the  "k-th  model  simplification" 
is  achieved  by  the  "k-th  parameter  perturbation."  Under  the  assumptions  that 
the  fast  subsystems  were  weakly-coupled  among  themselves  and  that  each 
fast  subsystem  was  affected  by  the  control  of  one  decision  maker  alone,  the 
perturbation  analysis  in  [15,16]  established  sufficient  conditions  for  the 
multimodel  response  to  be  close  to  the  actual  system  response.  The  analysis 
served  as  a  basis  for  a  decomposed  design  approach  wherein  each  decision 
maker  had  to  solve  a  separate  low-order  control  problem  in  the  fast  time- 
scale,  and  jointly  solve  a  low-order  game  problem  in  the  slow  time-scale. 

The  two  problems  were  solved  Independently  to  form  the  composite  strategies 
which  were  shown  to  stabilize  the  overall  system  for  sufficiently  small 
values  of  the  perturbation  parameters,  provided  each  of  the  low-order 
problems  had  a  stabilizing  solution.  Furthermore,  the  multimodel  solution 
was  shown  to  be  the  asymptotic  limit  of  the  optimal  solution,  thus  establishing 


its  well-posedness. 
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In  Chapters  2,  3,  and  4,  we  continue  to  study  the  role  of  time- 
scales  in  multimodel  strategy  design  within  the  framework  of  multiparameter 
singular  perturbations.  Specifically,  we  attempt  to  establish  the  validity 
of  multimodel  generation  by  "k-th  parameter  perturbation"  for  classes  of 
linear  deterministic  systems  and  linear  stochastic  systems  under  nonclasslcal 
Information  patterns. 

The  structural  assumptions  In  [15,16]  correspond  to  practical 
situations  where  the  fast  subsystems  are  geographically  distinct,  each  under 
the  direct  Influence  of  one  decision  maker  who  Interacts  with  the  other 
decision  makers  only  through  the  slow  core  [3].  But  there  might  be  situations 
where  subsystem  characterization  by  time-scales  does  not  correspond  to 
geographically  distinct  areas  (In  which  case  the  fast  subsystems  might  not  be 
weakly -coup led) ;  and/or  a  mutual  relocation  of  controls  among  the  decision 
makers  might  not  be  possible  due  to  the  inherent  noncooperative  nature  of  the 
problem  (In  which  case  each  fast  subsystem  might  be  controlled  by  more  than 
one  decision  maker).  In  Chapter  2,  we  examine  the  Implications  of  relaxing 
the  structural  assumptions  made  in  [15,16].  The  general  multiparameter  game 
problem  has  been  formulated  In  [17],  and  the  111-posedness  of  the  limiting 
solution  has  been  demonstrated  through  some  examples.  This  happens  because 
now  the  decision  makers  face  game  situations  In  both  the  fast  and  slow  tlme- 
scales,  unlike  in  [15,16]  where  they  faced  a  control  problem  in  the  fast  time- 
scale.  In  Chapter  2,  we  demonstrate  that  multimodel  generation  by  "k-th 
parameter  perturbation"  Is  still  well-posed  provided  each  decision  maker 
solves  his  problem  by  the  hierarchical  reduction  scheme  of  single  parameter 
games  [21].  Unlike  the  multimodel  solution  of  [15,16],  the  above  procedure 


does  not  guarantee  stability  of  the  overall  system  unless  the  coupling 
between  the  fast  subsystems  Is  "limited"  though  not  necessarily  "weak." 

In  [15,16]  only  deterministic  problems  with  full  state  Information 
for  each  decision  maker  were  treated.  The  analysis  Involved  examining  the 
limiting  solution  of  Rlccatl  equations  or  coupled  Rlccatl  equations  only.  At 
that  stage  It  was  not  quite  clear  whether  multimodel  generation  by  "k-th 
parameter  perturbation"  would  be  well-posed  for  stochastic  problems  with 
nonclasslcal  Information  patterns  where  the  optimal  solution  may  Involve 
Integro-dlf ferentlal  equations  of  no  particular  standard  type.  In  Chapters  3 
and  4  we  establish  the  validity  of  such  multimodel  generation  for  a  class  of 
stochastic  Nash  and  team  problems.  The  weak-coupllng  assumption  on  the  fast 
subsystems  Is  retained  to  focus  on  aspects  of  randomness  and  nonclasslcal 
Information  patterns. 

In  Chapter  5  we  consider  the  average* cost- per- stage  prcMem  for 
finite-state  Markov  chains  with  multiple  decision  makers.  The  existing 
results  on  Markov  games  are  few  [65],  and  do  not  provide  us  with  a  proper 
framework  to  study  the  multimodeling  problem  directly.  For  this  reason  we 
first  obtain  fundamental  existence  results  for  Nash  and  Stackelberg  solutions 
for  cases  when  each  decision  maker  knows  only  the  current  value  of  the  state, 
and  when  the  leader  also  has  access  to  the  followers'  controls  at  every 
stage.  An  algorithm  Is  obtained  for  computing  affine  Incentive  strategy  for 
the  leader  which  helps  him  achieve  his  global  optimum.  The  practical  use¬ 
fulness  of  Markovian  decision  processes  has  been  severely  limited  due  to 
the  extremely  large  dimension  of  most  Markov  chains.  Recent  applications  In 
queueing  theory  [46,47]  and  management  of  hydrodams  [41,42]  have  exhibited 
Markov  chain  models  with  a  "weakly-coupled"  structure  suitable  for 
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perturbatlonal  analysis.  In  Chapter  5,  after  obtaining  the  general  results, 
we  consider  a  class  of  controlled  Markov  models  consisting  of  N  weakly • 
coupled  groups  of  strongly-interacting  states.  Each  group  Is  under  the 
authority  of  a  single  decision  maker  having  his  own  performance  objective  and 
the  overall  system  Is  coordinated  by  a  leader  whose  objective  Is  to  optimize 
some  global  system  performance.  The  problem  considered  Is  one  where  these  N 
decision  makers  are  In  Nash  equilibrium  among  themselves  and  In  Stackelberg 
equilibrium  with  the  leader.  For  the  Incentive  design  problem.  It  Is  shown 
that  near-optimal  policies  can  be  obtained  from  multiple  reduced-order  models. 

The  basic  challenge  In  multimodeling  Is  to  Identify  the  "core" 
where  there  Is  a  strong  Interaction  among  all  the  decision  makers  and  other 
low-order  subproblems  where  the  Interactions  are  weak.  This  leads  to  the 
possibility  of  decentralized  strategy  design  by  the  decision  makers  using 
several  low-order  models  of  the  system.  Such  a  decomposition  need  not  be 
based  on  time -scale  considerations  alone.  In  large  scale  systems,  the 
decision  makers  observe.  In  general,  different  variables  through  their 
Individual  objective  functionals.  These  observed  variables  play  a  crucial 
role  In  the  solution  of  the  problem.  In  Chapter  6,  we  focus  on  the  role  of 
the  observed  variables  In  multimodel  strategy  design.  We  attempt  to  Identify 
the  core  by  examining  the  observability  structure  Induced  by  the  observation 
sets  of  the  decision  makers.  The  system  Is  represented  In  the  observability 
decomposition  form  using  the  techniques  of  chained  aggregation  [8,54,55]. 

By  overlapping  appropriately  the  Input  structure  with  the  observability 
decomposition,  we  Identify  a  class  of  admissible  strategies,  referred  to  as 
Structure-Preserving  strategies,  which  generates  multimodel  solutions.  The 
Information  Induced  multimodel  solutions  developed  In  Chapter  6  are  shown 


to  admit  partial  noninteraction  among  the  decision  makers  under  certain 
conditions  which  depend  on  the  Information  pattern.  Applications  to  the 
control  of  large  scale  Interconnected  subsystems  and  multi-area  power  systems 
are  also  discussed. 

The  thesis  concludes  with  Chapter  7  where  we  summarize  the  results 
obtained,  outline  the  main  contributions,  and  Indicate  directions  for  future 


research. 


CHAPTER  2 


MULTIMODEL  NASH  STRATEGIES  FOR  MULTIPARAMETER 
SINGULARLY  PERTURBED  SYSTEMS 


2.1.  Introduction 

Multimodel  strategies  for  linear  deterministic  multiparameter 
singularly  perturbed  systems  have  been  obtained  In  [15,16]  under  the  assump¬ 
tion  that  the  fast  subsystems  were  weakly- coupled  among  themselves,  and  that 
each  fast  subsystem  was  affected  by  the  control  of  one  decision  maker  only. 
In  this  chapter  we  shall  consider  the  general  multiparameter  game  problem 
wherein  the  fast  subsystems  need  not  be  weakly- coupled  and  each  fast  sub¬ 
system  might  be  controlled  by  more  than  one  decision  maker.  This  problem 
has  been  formulated  In  [17],  and  the  111-posedness  of  the  limiting  solution 
has  been  demonstrated  through  some  examples.  This  happens  because  now  the 
decision  makers  face  game  situations  In  both  the  fast  and  slow  time-scales, 
unlike  In  [15,16]  where  they  faced  a  control  problem  In  the  fast  time-scale. 
In  the  sequel  we  shall  demonstrate  that  multimodel  generation  by  "k-th 
parameter  perturbation"  Is  still  well-posed  provided  each  decision  maker 
solves  his  problem  by  the  hierarchical  reduction  scheme  of  single  parameter 
games  [21]. 

In  Section  2.2  the  problem  Is  formulated  and  the  exact  solution  Is 
given.  In  Section  2.3  a  procedure  Is  outlined  to  obtain  decentralized 
strategies  from  multimodel  solutions.  In  Section  2.4,  well-posedness  of 
the  multimodel  solution  Is  established;  and  finally  In  Section  2.5,  the 
Important  conclusions  drawn  from  the  results  of  this  chapter  are  summarized. 


2.2.  Problem  Formulation 

Consider  the  following  linear  system  controlled  by  two  decision 

makers 


i  -  A_x  +  +  S  x(0)  - 


oo 


1-1 


1-1 


ol  1’ 


(2.1 

(2.11 


l.J  -  1,2;  l#j 


dim  x=n^,  dim  <1^®  “l*®l*  The  small  singular  perturbation 


parameters  r^^present  small  time-constants,  Inertias,  masses  etc.  We 
consider  the  case  when 


m  <  —  <  M 
®2 


(2. 


for  some  positive  constants  m  and  M.  Thus  the  set  H  to  which  we  restrict 


the  possible  values  of  e  Is  a  sector  In  R  .  The  matrices  are  assumed  to 


be  nonsingular.  The  cost  functionals  of  the  two  decision  makers  are 


1  “ 

Jl  -  2  /  *1^11*1  l,j-1.2;  Ijtj.  (2.: 


The  usual  definiteness  assumptions  are  made  on 


Notice  that  the  1-th  decision  maker  (DM1)  penalizes  only  z^  In  his  cost 


functional,  but  not  Zy  This  Is  because  his  simplified  model  would  neglect 


Zj  under  the  multimodel  situation.  The  decision  makers  select  (u*,up  such 


that 


Jl(u*,up  i  Jj^(uj^,up  for  all  admissible  u^;  l,j  -  1,2;  li^j 


(2. 


The  Inequalities  (2.4)  define  the  Nash  equilibrium. 


E 


The  system  model  (2.1)  Is  of  Interest  in  several  cases.  There 
might  be  situations  where  subsystem  characterization  by  time-scales  does 
not  correspond  to  geographically  distinct  areas  (In  which  case  the  fast 
subsystems  might  not  be  weakly- coup led) ;  and/or  a  mutual  relocation  of 
controls  among  decision  makers  might  not  be  possible  due  to  the  Inherent  non- 
cooperative  nature  of  the  problem. 

The  Ill-posed  nature  of  the  usual  order  reduction  method  for  the 
problem  (2.1)-(2.4)  was  demonstrated  In  [17]  through  some  examples.  This  is 
to  be  expected  from  past  results  on  single  parameter  games  [21-24],  since 
now  the  decision  makers  face  game  situations  In  both  the  fast  and  slow  tlme- 
scales,  unlike  In  [15,16],  when  they  had  to  solve  only  a  control  problem  at 
the  fast  subsystem  level.  This  apparently  minor  modification  In  the  situation 
destroys  the  complete  decoupling  between  the  two  low-order  problems,  and 
forces  one  to  look  for  noncausal  reduced- order  models  which  would  yield  well- 
posed  solutions. 

The  definitions  of  the  various  matrices  that  appear  In  the  following 
analysis  are  given  In  Appendix  A.  Restricting  the  control  strategies  to  be 
linear  functions  of  the  state,  the  optimal  solution  to  (2.1)-(2.4)  Is  given 


by  [11] 


<  *  -“Ik 


|K^x;  X  -  [x*  z^  z^]’ 


(2.5) 


where  Is  a  stabilizing  solution  of  the  coupled  Rlccatl  equations. 


i,j-l,2; 


(2.6) 


Notice  that  since  A  and  are  functions  of  e^,  is  also  a  function  of  e^. 
In  general  even  for  low-order  problems  the  presence  of  e  causes  numerical 
"stiffness"  in  (2.6).  The  optimal  cost  of  each  player  is  given  by 


Jj  -  I  i(0)'K^(e)x(0);  i-1,2. 


(2.7) 


2.3.  Multimodel  Strategy  Design 

The  notation  (O^^^  in  the  following  formulation  refers  to  the 
quantities  associated  with  DM1 's  simplified  problem.  DM1  arrives  at  his 
simplified  model  by  neglecting  the  jth  fast  subsystem,  i.e.,  by  setting 
Ej  ■  0  in  (2.1).  This  gives 

Substituting  (2.8)  in  (2.1)  for  results  in  DMi’s  simplified  model 

The  cost  functionals  of  the  two  DMs  as  viewed  by  DMi  are  obtained  by  sub¬ 
stituting  (2.8)  in  (2.3) 

-  I  /’(='«  *  u<« 

J  ‘  0  *  Jj  i  ij  i  j  i 

+  2x<^)  +  2x‘«  4-  2x'«  4  ’r'^u^' 

+  +  (2.10) 


We  propose  to  solve  the  game  (2. 9)- (2. 10) by  the  hierarchical  reduction  scheme 
of  [21]  which  transfers  fast  game  information  to  a  modified  slow  game. 

The  fast  subsystem  is  derived  by  assuming  that  the  slow  variables 
are  constant  during  the  fast  transients > 


i  if  ^ii  *if  ^®ii  '^if  ^®lj  jf  ’  if  io "  i  '  ^2.11) 


The  associated  cost  functionals  are 


'if 


‘jf  "ij“jf 


.(i) 


Jf 


^  2  i  Sj  *if  ^*if  “if  if  ’^J  “jf 


Jf  JJ  Jf  if  Ji  if  Jf  JJ  if  ^ 


(2.12) 


where  ^ and  is  found  from  (2.18). 


The  linear  closed-loop  Nash  strategies  for  (2.11)-(2.12)  are  given 


by 


“if  \l®ii  ^'if  *if  “  "“if  *if 


(2.13a) 


Jf  JJ  ^  J  tj  *^Jf  JJ  ii  ii  *^if  ^*if 


“jf  *if 


(2.13b) 


where  and  are  stabilizing  solutions  of 


‘^ii^^lf  ^li  ^^ii 


„(i)  „(l)-(i)^(i)  „(i)'  (i)'  (i)  .  ^(i) 

^If  "^if  ®ij  “jf  "“jf  ®ij  ^if  "^“jf 


R  Nf<i> 
^ij“jf 


,(i) 


-M:r  R,.M) 


(i) 


(2.1Aa) 
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IT  Hj 

It.* 


A±)  .MA±)  .  ,(i)  ’  (i)  +T<^^ '  1 


"ii  '‘'jf  “ii  ‘i  ‘ 

(1)'  (i)  (I)  -  0 

+  Mif  -  0. 


(2.14b) 


Next  we  make  use  of  the  fast  controls  (2.13)  and  substitute  the  following  for 


u^^^  and  (2.9)  and  (2.10)  , 


(i)  ^(i)  (1)  .  -(i) 

"  ~*^if  *1  “i 


(2.15a) 


(2.15b) 


The  new  system  and  cost  functionals  are  given  by 


i<i)  .  A^^^x^^^ +  ;  x^^^  (0)  -  x  (2.16a) 

oo  ol  1  oi  1  oj  j  ’  o 


1  r“r  (1)'  ,(i).  (1)’^ 

jf{x  Q^^x  +z^  -2z^  (M^g  R^^u^  +M  K^jU  ) 


(2.17a) 


j(i)  .  i  f\w ’^W^W  ^  ,,(1) ^ ,(i)  •acu.u)  * 2x<^>  'sf >a'« 

+ 2x«)  ’?“>;(«  +  2z«)  +  2z<«  ’i»>i:f> 

+  if  >  'Rf  );f  >  *  if’  'Rf ’if  >  +  2uf  ’  'pf ’if  ’  )dt .  (2.17b) 


Now  we  set  0  in  (2.16b)  and  solve  for  z^  , 
h  ““^1  ^^io  ''s  "^®ii  “is  ^®ij  “js  ^ 


(2.18) 


Substituting  (2.18)  for  '  in  (2.16a)  and  (2.17)  ,  the  slow  subsystem  and 
cost  functionals  are  obtained  as 


.(1) . 

s  os  s  is  Is  Js  js 


xf  (0)  - 

s  o 


(2.19) 


2  ^  s  Is  s  s  Is  Is  s  Is  js 

+  uJ^^'r  +  2u{^^'p^J-^uJ^hdt  (2.20a) 

is  ii  Is  js  ijs  js  is  iis  js 

1  ’q<«x«>  *  2X<«  's<«if>  +  2x<«  + G<« 

2  ^  s  ^js  s  s  js  is  s  js  js  js  jj  js 


*s<«'R«>i<«*2a<«'p<«a“>>dt. 

is  jis  is  js  jjs  is 


(2.20b) 


Notice  that  the  slow  subsystem  and  the  associated  cost  functionals  contain 
information  about  the  fast  game.  The  linear  closed -loop  Nash  strategy  for 
(2.19)-(2.20)  is  given  by 


is  ii^  is  s  is  is  s  iis  js  is  s 


(2.21a) 


iCi)  .  -R«)-^P<«x<‘>  +  B<«'K“>*“>+P»>i“’l  - 

js  jj  js  s  js  js  s  jjs  is  js  -i 

where  and  are  stabilizing  solutions  of 

is  is  os  os  is  is  js  is  js  js  is  js  is  ■■ 

+  R  -  0  I 

js  ijs  js  is  ii  is 

^js  js  os  os  js  ■  js  is  js  "is  is  ‘  js  is  js  " 

+  -0.  ( 

is  jis  is  js  jj  js 


(2.21b) 


(2.22a) 


(2.22b) 


Hence,  the  composite  strategies  for  the  simplified  game  of  DM1  are  <;lven  by 
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“is  if  i 


js 


“jf  *i  ’ 


i,j-l,2;  iftj. 


(2.23) 


The  decentralized  multimodel  strategy  which  the  two  decision  makers 
use  on  the  full  system  (2.1),  as  obtained  from  the  two  simplified  games,  is 
then  given  by 


u 


c 

i 


'“is  “if  i 


i»  1,2. 


(2.24) 


Remarks ; 

1)  The  slow  and  fast  subproblems  are  both  game  problems,  and  are 
different  for  both  the  players.  This  is  in  contrast  to  the  weakly- coup led 
problem  considered  in  [15,16]  where  only  the  fast  subproblems,  which  were  control 
problems,  were  different  for  the  two  players;  whereas  the  slow  subproblem,  which 
was  a  game  problem,  was  the  same  for  both  the  players. 

2)  The  system  and  cost  matrices  of  the  slow  subproblems  of  both 
players  contain  Information  about  their  respective  fast  games,  highlighting  the 
"antlclpatlve"  nature  of  low-order  models  in  multiple  decision  maker  problems. 
This  is  again  in  contrast  to  the  weakly-coupled  case  of  [15,16]  where  the  fast 
and  slow  subproblems  were  solved  independently. 

3)  The  multimodel  solution  of  [15,16]  did  guarantee  the  stability  of 
the  overall  system  for  all  e  in  H,  but  the  multimodel  strategy  (2.24)  obtained 
when  the  fast  subsystems  are  not  weakly-coupled  does  not  guarantee  stability, 
unless  the  coupling  is  limited  (not  necessarily  weak).  Therefore,  the  following 
assumption  is  made: 


Assumption  A; 


The  solutions  to  the  reduced  games  exist,  and  when  the  multimodel 
strategy  pair  (u^jU^)  is  applied  to  the  original  system  (2.1),  the  closed-loop 
system  remains  asymptotically  stable  for  all  e  in  H. 


4.  Asvmntotic  Properties  of  the  Multimodel  Strate 


In  this  section,  we  shall  show  that  the  multimodel  strategy  and  the 
resulting  costs  are  well-posed  in  the  sense  that  they  tend  to  the  optimal 
strategy  and  costs  respectively  in  the  limit  as  the  small  parameters  go  to 


The  multimodel  strategy  (2.24)  is  put  in  a  convenient  form  as 


follows: 


"^f^^02^22^'^ls 

■^<''l2''2’2> 


e  0 

*^2 ^^12^22^ ^ 


(2.25a) 


“R^2f®62 


-lf<^lAu>'4s^ 


■^22®24’‘ 


■^l^^2l^ll^'4f|  *1 


^24'^ 


(2.25b) 


(2.2 


^  io  if  is  oi  js  js  js  is  is  is  ii  jjs 

,  (i)  (i)  -L,(i)'  -1,  (i)  (i) 

-^iis^iPjjs  ^  ^®ij  *^if  ^J\s 

i,j-1.2;  ±^j. 


To  avoid  unboundedness  in  the  solution  of  (2.6)  as  e-K)  in  H,  and  taking  into 
consideration  the  symmetry  of  K2  and  the  special  forms  of  A,  B^,  B^,  we 
seek  the  solutions  of  (2.6)  "in  the  form 


K^(e) 


(e) 

^iKoi  (e) 
.*^2*^02 


e^K^’cc) 

e,K«>(c) 


e  (e) 
2*^02  ^  ^ 


'^1^2  ^12  ’  i“1.2. 


Theorem  2.1;  The  following  relations  hold  under  Assumption  A: 


^00^ (0) 


k“>(0) 


(0) 

K»>(0) 

K»>(0) 

K<”(0) 

(0) 


-[(AojA;})’Kj^">  +  (AijA;})’K’j’; 

(A^2  ^22^ 

-1/0^  (A2j_  a  ) 


i.j-1.2;  i^tj 


lel-K) 

Proof ;  The  proof  involves  substituting  (2.27)  in  (2.6)  and  taking  the  limit 
as  ||e|-»-0.  The  detailed  manipulations  are  lengthy  and  are  omitted  here  for 
the  sake  of  brevity. 

Corollary  2.1;  If  the  multimodel  strategies  exist,  then 

lim  (L,(e)-K,(e))  =  0. 

Icl-KJ  ^ 

Proof:  The  result  is  an  immediate  consequence  of  Theorem  2.1. 

When  the  multimodel  strategy  (u^ju^)  is  applied  to  (2.1),  the 
resulting  cost  is  given  by 

jJ  *  I  x(0)'V^(e)x(0);  i-1,2  ;  (2.2 

where  V^(e)  satisfies  the  Lyapunov  equation 

V^(A-S^L^-S2L2)  + (A-S^L^-S2L2)'V^  +  Q^  +  L^S^L^  +  LjS^jL^  -0.  (2.2 

By  Assumption  A,  V^(e)  exists  and  is  positive  definite  for  all  e  in  H. 


Lemma  2.1: 


jJ  -  Jj  +  O(lel);  i-1,2,  Ve  in  H. 


Proof:  Subtracting  (2.6)  from  (2.29)  and  letting  Set 

Wi(A-SiLi-S2L2)+  (A-SiLi-S2L2)’Wi+  (K^-L^) 'S^(K^-L^)  +  (K^-L^) (K^-Lj) 

+  -  0.  (2.3 


From  Corollary  2.1  and  Assumption  A,  we  get 


19 


ij  c 


lim  W.  -  0;  i-  1,2 
lel-K)  ^ 


and  hence 


Jj- J*  +  O(lel);  i-1,2  Ve  in  H. 


We  have  proposed  that  the  multimodel  strategy  used  as  an  approxi¬ 

mation  of  the  exact  Nash  strategy  (u*,up.  It  is  not  clear  at  this  point 
why  decision  makers,  who  are  interested  in  a  Nash  strategy  should  use  the 
multimodel  strategy.  The  exact  Nash  strategy  (u*,U2)  satisfies  inequality 
(2.4),  which  guarantees  that  neither  decision  maker  can  reduce  his  cost 
functional  by  unilaterally  deviating  from  (u*,U2)*  Unfortunately,  the  multi¬ 
model  strategy  does  not  possess  this  property,  and  hence  it  is  necessary  to 
establish  its  near-equilibrium  property  [20].  We  have  shown  that  the 
resulting  costs  of  the  multimodel  strategy  are  0(|ell)  close  to  their  Nash 
equilibrium  values.  However,  closeness  of  the  costs  alone  is  not  sufficient. 

If  player-i  uses  u^,  player- j  solves  an  optimal  control  problem  in  u^ .  The 

c 

strategy  u^  must  be  a  near-optimal  strategy  for  this  optimal  control  problem, 
otherwise  player -j  would  have  no  motive  for  using  u^.  This  guarantees  that 
the  j-th  player  cannot  reduce  his  cost  by  more  than  0(1  el)  if  he  unilaterally 
deviates  from  (Uj^,U2).  Hence,  practically  the  players  have  no  motive  for 
cheating.  This,  however,  is  not  a  guarantee  against  cheating.  It  is  quite 

C  A 

possible  that  the  j-th  player  deviates  from  u^  and  uses  another  strategy  u^ 

that  reduces  his  cost  no  matter  how  Insignificant  the  reduction  is;  but  in 

doing  so  hurts  the  other  player  by  causing  a  substantial  Increase  in  J^.  Hence, 
c  c 

for  ®  near-equilibrium  strategy  pair,  it  must  be  true 

A  c  ^  c  c 

that  any  u^  that  results  in  Jj (u^,Uj)  < (u^.u^)  cannot  increase  by  more 

than  0(1  el).  The  definition  of  a  near-equilibrium  strategy  as  given  in  [20] 


does  not  require  the  existence  of  a  Nash  equilibrium  strategy.  Here  we  shall 


show  that  the  proposed  multimodel  strategy  just  near-equili¬ 

brium  Nash,  but  being  0(11  d)  close  to  (u*,up  ,  is  also  asymptotic  Nash. 

Define  the  set  of  admissible  strategies  for  player  1,  when  player  2 


(2.31 


uses  u^,  as  the  set  of  linear  feedback  strategies  of  the  form, 
Uj^  -  -F^(e)x  -  -(Fj^Q(e)x+ F^j^(£)z^+ 
such  that  the  closed-loop  matrix 


=  (A-B^F^-S2L2) 


is  stable  for  all  e  in  H.  To  avoid  mathematical  complications,  the  feedback 
matrices  of  (2.31)  are  restricted  to  be  of  the  form. 


Fii(e)  »  F^i  +  0(|e«);  i  =  0,l,2. 


(2.32 


Denote  this  set  by  Uj^.  The  set  of  admissible  strategies  for  player-2  when 
player-1  uses  u^^  is  similarly  defined  and  is  denoted  by  U2* 

The  following  lemma  is  needed  to  establish  the  near-equilibrium 
Nash  property  of  the  multimodel  strategy. 


Lemma  2.2s 


Proof;  Lee 


"  0(||ell);  VUjSUj^,  e  in  H. 


■  I  x'(0)Ti^(0) 


(2.33: 


where  satisfies 


Tj^(A  -  Bj^F^  -  SjKj)  +  (A  -  Bj^Fj^  -  S2K2)  'T^  +^1  +  +K2Sj^2K2  "  0 

(2.34: 


J,(u,,u5)  -  i  x(0)'P,x(0) 


(2.35: 


and 
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m  ■ 


where  saclsfles 

Pj^(A  -  -  SjLj)  +  (A  -  -  Sjl^)  'P^  +Qj^  ‘*'^2^12^  *  °  ’ 

Subtracting  (2.34)  from  (2.36)  and  letting  we  get  (2.36) 

N^(A  -  B^F^  -  S2L2)  +  (A  -  Bj^F^  -  S^L^)  •n^  +  T^S2(K2  -  L2)  +  (K2  -  I^)  ’S2T^ 

+  (K2-L2)’Sj^2<K2-^2>'*‘Vi2^4“‘^>  "^(4  ‘^>'512^2-0  •  ^2.37) 

Rrom  Corollary  2.1,  and  knowing  the  stability  of 


lim  N.  -  0 

11^11-0  ^ 


which  proves  Lemma  2.2. 


The  following  two  theorems  establish  the  near-equilibrium  property  of 


the  multimodel  strategy. 


Theocem  2  .2 : 


Jl(u®,Uj)  SJi(u^,u®) +0(lle|l);  Vu^eUj,,  e  in  H;  i,j-L,2,  i  }l  j 

i.e.,  the  multimodel  strategy  is  almost  secure  against  cheating. 

Proof*  We  have 

Jj^(u®,u®)  -  Jj^CUj^.u®)  +  J^(u®,u®)  -J^(u*,U2)+Ji(u*,U2)  -Jj^(u^,u^) 

Since  Jj^(u*,U2)  *  Jj^<Uj^,U2) ,  we  get 


J^(u^,U2)  ^  “  3  j^('^j^»'*2^  ^ 


From  Lemma  2.1  and  Lemma  2.2,  it  follows  that 
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Jl(u®,u|)  a;  +  O(lleil)  . 

This  proves  Che  theorem  for  1*1,  J  *2.  The  other  case  is  similar. 

Theorem  2.3: 

a:  Jj^(u2.tt®)+0(ti£ll); 

YUjCiij  such  that  Jj(u®,Uj)  SJj(u®,Uj) 

VC  eH;  l.j -1,2;  iflj 

Proof;  We  prove  for  i-2,  j  -  1.  The  other  case  is  similar.  Suppose  player-2 
uses  U2  -  optimal  reaction  of  player-l  is  given  by 

u*  -  (2.38) 


resulting  in 


(U 


c. 

l’"2> 


i  i (0) 'M^x(O) 


(2.39) 


where  satisfies 

M^(A  -  S  j^Mj^  -  S2L2)  +  (A  -  S^Mj^  -  S2L2)  +Qj^  +  qSj2l'2  -  0- 

(2.40) 

Subtracting  (2.40)  from  (2.29)  for  i-1,  j*2  and  letting  $ »  we  have 

^(A  ■■  “  S2l<2)  +  (A  -  ”  S2l2^'^  ■  M^)s  0. 

(2.41) 

It  follows  from  Theorem  2.2  that 

Jl(i][,u®)  -  J^(u®,u|)  -O(llell)  (2.42) 


or 


liffl  i(e)  •  0 

lle||-0 


(2.43) 
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and  hence  to  satisfy  (2.41)  we  should  have 

lim  (L,  -M.) -0.  (2.44) 

llell-0  ^  ^ 

A  A 

Let  be  any  strategy  such  that 

J^Cu^.u®)  -  i  i(0)‘D^x(0)  s:  Jj^(u®,u®)  -  |  ^(0) 'Vj^^(O) 

(2.45) 

satisfies  the  Lyapunov  equation 

Dj^(A  -  -  SjLj)  +  (A-  -  Bj^Fj^  -  ’Dj^  +^1  +F[BiiFj^  *  °  * 

(2.46) 

Subtracting  (2.40)  from  (2.46),  satisfies 

Yj^(A-Bj^F^-S2L2^‘*‘(^"®l^l'^24^''‘^l'*‘*“°  '* 

where 

From  (2.45)  it  follows  that 

0  s  x(0)''lfj^(e)x(0)  i  x(0)»«(E)x(0)  .  (2.49) 

Hence,  due  to  (2.43)  we  get 

llm  '?-(£>  »0  .  (2.50) 

II  ell-0  ^ 

and  therefore,  from  (2.47)  it  follows  that 


lim  (Rrj’B{M,  -F.)  -0  .  (2.51) 

l|eil-0  ^  ^ 

Equations  (2.44)  and  (2.51)  show  that  any  strategy  satisfying  (2.45)  must 
satisfy 

lim  (RTHIL.  -  F,  )  -  0  . 
ll6il-0  ^  ^  ^ 


(2.52) 


JzC^i.uJ) -J  x(0)'D2K(0),  J2(u][.up-ix(0)'V2X(0)  j  (2.53) 

where  saclsfles  the  Lyapunov  equation 

DjCA  -  -  S^Lj)  +  (A  -  -  $24)  *02  +Q2  +^1^21^1  ‘'’4^24  “  °  ‘ 


(2.54) 


Subtracting  (2.29)  for  i«2,  j=l  from  (2.54)  and  letting  *^2 "  ^2"*'^2 

Y2(A  -  Bj^Fjl  -  S2L2)  +  (A  -  Bj^Fj^  -  S2L2)  •'fj  +V2Bl(hi®i^l  ‘  '®i^2 

+  (R-Jb-L^  -  F^)  'RjiCRliBlLj  -  Fj)  +L{B^R-iRji(r^  -  R-Jb^L^ 

+  (Fj^  -  R^iB^Lj^)  '^21®!!®!^!  *  °  *  (2.55) 


From  (2.52)  and  knowing  the  stability  of  (A-Bj^Fj^-S2L2)  it  follows  that 

lim  Y,  -  0 

lle!l-0  ^ 

which  proves  the  theorem  for  1"2,  j  “1. 


(2.56) 


By  a  simple  modification,  the  multimodel  strategy  (2.24)  can  be  reformulated 
as  a  linear  function  of  the  slow  state  alone.  To  obtain  DMi's  modified  multi 
model  strategy,  we  substitute  (2.21)  Into  (2.18)  to  give 


1  li  io  ii  is  ij  js  s 


(2.57) 


Substituting  (2.57)  in  (2.24)  for  we  obtain, 

"  "^^is  '^if  ^il  ^^io  '®li  ^is  -®i1  ^is  ^J*s  • 


(2.58) 
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This  can  be  factorised  to  put  into  a  convenient  form, 


+(Aj^2^22^ 

-  -RiiB'L^x  ; 


0  X 


0  z. 


0  z. 


(2.59a) 


"^2^®62  ^12^^!  ®22^^2^ 


•ejitCAl 


or  11^ 


+(A2iAiJ)  J 
^2^« 

•  ”R2^2^2*  i 


(2.59b) 


where 


K  -K  -  i,j=1.2;  ii* j . 

im  im  io  ii  Is  ij  js  ii  if  ^ 

The  resulting  cost, when  the  nodified  aultimodel  strategy  is  applied  to 


(2.60) 


(2.1),  can  be  written  as 


“  I  5  ^*^*2  j 


(2.61) 


where  satisfies  the  Lyapunov  equation 

Vi^(A  -  -  S2L2)  +  (A  -  -  S2L2)  'Vi^+Qi  +LiS^L^  +  L!S^jLj  -  0  ; 


r^ii  ^12'! 

L^21  ^22j 


i,j  -1,2;  ijij 


(2.62) 


is  block  D-stable  [18],  then  from  Assumption  A  it  follows  that 


(A  -SjL 

deflnlt*'  for  all  a  in  H.  Following  the  methods  used  earlier,  it  can  be  shown 
that  ‘ 

Jl^  -  J*  +  0(11  ell);  i- 1,2  Ve  in  H.  (2.63) 

The  modified  multimodel  strategy  also  possesses  the  near-equlllbrlum  property. 
This  Is  true  because 

Ji(“l.u?jj)  -  Jj^(u^,u*)  -  O(llell)  Vu^eUj^,  e  in  H;  i,j  -1,2;  ijij. 

(2.64) 

The  above  fact  follows  directly  from  the  discussions  in  [25],  and  Lemma  2.2. 

Hence,  together  with  (2.63)  and  (2.64)  we  establish  the  near-equi- 
llbrlum  property  of  the  modified  multimodel  strategy,  namely, 

'J^('i^^,Uj^)  ^ J^(u^,Uj^) +0(ilell)?  £  in  H;  i,)*l,25  i^j, 

(2.65) 


^-821^)  is  stable  for  all  e  in  H;  and  hence  exists  and  is  positive 


(2.66) 

Ve  In  H;  i,j  -  1,2;  1  j 


Finally,  we  would  like  to  remark  that  though  the  approximate  strategies 
derived  in  this  paper  possess  near-equilibrium  and  as}nnptotlc  Nash  pro¬ 
perties,  the  resulting  state  trajectories  are  within  0(1  el)  of  the  optimal 
trajectories  only  outside  some  boundary-layer. 


2.5.  Cone lus Ions 


In  this  chapter  a  procedure  has  been  formulated  to  obtain  decen¬ 
tralized  strategies  under  a  multimodel  situation.  The  proposed  strategies 
are  near-equilibrium  and  asymptotic  Nash.  The  subsystem  classification  was 
based  on  time-scale  separation,  which  allowed  the  system  to  be  modeled  with 
multiparameter  singular  perturbations.  The  weak-coupling  assumption  made  on 
the  fast  subsystems  in  [15,16]  was  removed.  This  apparently  minor  modifica¬ 
tion  in  the  model  structure  changed  completely  the  multimodel  solution 
procedure.  The  reduced  games  for  the  two  players  became  completely  different, 
in  contrast  to  the  problem  in  [15,16]  where  the  two  reduced  games  were  only 
partially  different,  the  difference  being  in  the  fast  control  problems;  the 
slow  game  problems  being  Identical  for  both  the  players.  Moreover,  the  multi¬ 
model  solution  in  [15,16]  guaranteed  the  stability  of  the  overall  system  for 
all  t  in  H;  but  the  multimodel  solution  proposed  here, under  the  absence  of 
weak-coupling,  failed  to  guarantee  the  stability  of  the  overall  system  unless 
the  coupling  between  the  fast  subsystems  is  limited  (not  necessarily  weak) . 

In  the  case  when  the  boundary-layer  system  is  as3nnptotically  stable  for  all 
e  in  H  (block  D-stable> ,  a  procedure  is  given  to  modify  the  multimodel 
strategies  to  obtain  strategies  which  are  linear  functions  of  the  slow  state 
alone.  These  modified  strategies  are  also  near-equilibrium  and  asymptotic 


Nash. 


CHAPTER  3 


A  MULTIMODEL  APPROACH  TO  STOCHASTIC  NASH  GAMES 


3.1.  Introduction 

In  this  chapter  we  establish  the  well-posedness  of  multimodel 
generation  by  "k-th  parameter  perturbation”  for  a  class  of  stochastic  Nash 
games  with  a  prespecified  flnlte^dlmenslonal  compensator  structure  for  each 
decision  maker.  The  weak>coupllng  assumption  Is  retained  to  keep  the  analysis 
simple,  and  focus  on  the  stochastic  aspects  of  the  problem. 

In  Section  3.2  we  formulate  the  problem  and  raise  some  crucial 
questions.  Section  3.3  demonstrates  multimodel  generation.  In  Section  3.4 
we  establish  the  weak  limit  of  the  fast  stochastic  variable.  In  Section  3.5 
we  solve  the  slow  subproblem  and  In  Section  3.6  we  solve  the  fast  subproblems. 
In  Section  3.7  we  examine  the  limiting  behavior  of  the  exact  solution  and 
establish  the  well-posedness  of  the  multimodel  solution.  Finally,  In  Section 
3.8,  we  conclude  the  chapter  by  summarizing  the  main  results. 


3.2.  Problem  Formulation 

A  linear  stochastic  system  consisting  of  a  strongly-coupled  slow 
core  and  weakly-coupl»id  fast  subsystems  controlled  by  two  decision  makers 
Is  modeled  by 


2  2 

2  -  A  z  +  E  A  .2,  +  E  B  .u^  +  L  w;  z  (0)  ■  z  . 

o  o  o  oj  j  oj  j  o’  o'  '  oo 


(3.1a) 


'i"l  ■  \o^c*  "l"'’  ■  'lo’ 


l,j-l,2;  l#j  . 


(3.1b) 


with  the  observation  vectors  for  each  decision  maker  given  by 


’'ol  ■  '=0i'o  ''ol 


(3.2a) 


>'u  •  =11^  * 


(3.2b) 


where  dims  -n  ,  dim  z .  ■  n. ,  dim  u.  *  m. ,  dim  y  .®P  dimy.  i=l,2. 

o  o  i  i  i  1  oi  oi  ii  ii 

The  processes  w,  are  assumed  to  be  independent  white  Gaussian  with 

covariances  W,  respectively,  with  positive  definite  and 

The  initial  conditions  are  assumed  to  be  Gaussian  with 


I 


E[(2.  -Z  )(z.  -Z,  )'l 
io  io  jo  jo 


i,j-0.1,2. 


(3.3) 


The  small  singular  perturbation  parameters  represent  small  time-constants, 
inertias,  masses  etc.;  while  the  small  regular  perturbation  parameters 
represent  weak-coupling  between  the  fast  subsystems.  The  states  z^  are  "fast 
since  their  derivatives  are  of  order -l/e^.  The  matrices  are  nonsingular. 

The  main  idea  behind  Inserting  the  /e^  factor  multiplying  the  white 
noise  terms  in  the  state  and  observation  equations  for  the  variables  z^,  z^ 
is  to  make  them  meaningful  fast  variables  for  control  and  estimation  purposes 
Without  in  the  state  equation,  the  variable  z^(t)  tends  to  a  white  noise 
vector  with  infinite  variance  parameter  as  e^-^0.  If  this  factor  is  dropped 
from  the  observations  equation,  then  z^(t)  cannot  be  estimated  meaningfully 
because  the  signal-to-noise  ratio  tends  to  zero  as  A  more  complete 

discussion  about  the  use  and  justification  of  this  model  can  be  found  in  [33]. 


The  cost  functionals  of  the  two  decision  makers  are  given  by 


+  u^R^u^)dt}  ;  i“l,2.  (3.4) 


The  equilibrium  solution  to  the  stochastic  zero>sum  game  under 
general  Information  structures  has  been  obtained  in  [26,29].  The  solution 
has  been  shown  to  require  infinite-dimensional  compensators  which  are  not 
practical  to  Implement.  Although  the  general  solution  to  the  nonzero-siim 
Nash  game  has  not  yet  appeared  in  the  literature,  it  appears  however,  that 
infinite-dimensional  compensators  would  still  be  required.  In  such  a  case, 
one  can  either  make  specific  assumptions  regarding  the  information  structures 
of  the  two  players,  under  which  the  required  compensators  turn  out  to  be 
finite-dimensional  dynamic  systems  [28];  or  solve  the  problem  under  the 
formal  restriction  that  each  player  is  limited  to  a  compensator  of  fixed 
dimension,  the  output  of  which  is  all  that  is  available  to  him  in  the  genera¬ 
tion  of  his  control  at  that  time  [35]. 

Our  Intention  here  is  not  to  solve  the  general  LQG  Nash  game,  but 
to  obtain  approximate  limiting  strategies  for  a  given  solution  methodology. 
For  this  purpose,  we  extend  the  results  of  [35]  for  the  constrained  esti¬ 
mator  problem,  to  two-person  nonzero-sum  LQG  Nash  games  and  based  on  this 
solution  methodology  obtain  the  limiting  strategies  under  a  multimodel 
situation.  Our  motivation  in  taking  the  above  approach  is  that  finite- 
dimensional  estimators  are  practical  to  implement,  and  possess  some  nice 


properties . 


The  definitions  of  the  various  matrices  that  appear  In  the 


following  analysis  are  given  In  Appendix  B.  Defining  x  *  ^2^'* 

y^»  [y^j^  — ^  ^11^**  ^l"  ^'^ol  ^11^**  system  of  equations  (3.1)-(3.4)  can 
be  written  In  a  composite  form  as 
2 

X  «  Ax  +  Z  B.u,  +  Lw;  x(0)  ■  x  (3.5) 

j-1  J  J  o 

y^  =  C^x  +  v^;  1-1,2.  (3.6) 

« 

E[x  ]  =  X  ;  E((x  -X  ) (x  -X  ) ']  -  N  (3.7) 

o  o  o  o  o  o 

« 

where  dlmx-n»n  +n,+n.,,  dim  y .  -  P .  »  P  .+P.  . 

o  1  2  “^1  1  ol  11 

Jl  -  j  {x'(T)riX(T) +/  (x'Q^x+u’R^Ui)dt};  1-1,2.  (3.8) 

o 

Each  decision  maker  Is  constrained  to  use  only  an  n-dlmenslonal  compensator 
of  the  form 


^1*1  ®l“l’ 


1-1,2, 


(3.9) 


The  decision  makers  are  required  to  select  the  matrices  F*,  G*,  H*,  the 
Initial  conditions  x*(0),  and  the  closed-loop  control  laws  u*(x^(t),t),  such 
that 


E[J^(u2,u*)1x^]  <  E[J^(u^,up|x^];  l,j»l,2;  i^i  .  (3.10) 

where  X^  denotes  a  combination  of  x^(t)  and  the  a-prlorl  Information. 

The  pair  of  Inequalities  in  (3.10)  define  the  Nash  equilibrium 
for  the  problem  (3.5)-(3.9). 

To  solve  the  problem  posed  in  equations  (3.5)-(3.10) ,  we  need  the 
following  result  which  is  a  generalization  of  [35]  for  the  nonzero-sum  case. 


Theorem  3.1;  A  sufficient  condition  for  two  closed-loop  control  laws 


(u*,up  to  be  a  Nash  pair  for  the  problem  defined  by  (3.5)-(3.10)  is  that 
there  exist  real-valued  functions  Ij^(x,t)  differentiable  in  each  variable, 
which  together  with  u^  and  u|  satisfies  for  all  t€  [0,T]  the  following 
conditions : 

Defining  for  all  te  [0,T] ,  the  scalar  functions  by 

2 

S^(x,Uj^,U2,t)  -  I^^(x,t)  +  I^^(x,t)  [Ax  +  ^E^BjU^ +Lw] 

+  i  x-Q,x  +  i 

min  E{S  (x,u  ,u*  t) lx. (t)}  «  0 
u^  ^  i  J  1 

E{S^(x,uJ,u*,t)|X^(t)}  -  0 

I^(x,T)  -  j  x’r^x 

i,j-l,2;  i5*j. 

Applying  Theorem  3.1,  the  solution  to  the  full  problem  (3.5)-(3.10)  is  given 
by 


(3.11a) 

FJ  -  A-  (I+ 

(3.11b) 

(3.11c) 

-“l 

(3. lid) 

x*(0)  ■  x^ 

(3. lie) 

where  K.  satisfies  the  coupled  Rlccati  equation 


+  K^A  +  A\+  -  0 ;  K^  (T)  -  . 


M(t}  Is  a  symmetric  nonnegative  definite  matrix  defined  as  , 


M(t)  *  E{m(t)m'(t)}  ;  m(t)  ■  x-x 


satisfying  the  differential  equation 


M  -  FM  +  MF'  +  B®B'  , 


with  M . .  (0)  “X  X  ’  +  N ; 
ij  o  o 


i- j  -0 


elsewhere . 


The  following  relations  can  be  readily  derived 


E{x(t)lX^(t)}  -  x^(t) 


(3.15 


E{(x(t)-x^(t))x^(t)}  -  0 


(3.16 


E{Xj(t)|x^(t)}  -  [1+ (Mj^-Mj^)(M^^-M^^)"-^]x^(t) 


M  a  M'  a  M'  a  M 
io  oi  ii  ii 


I^(x,t)  a  i  x'K^x  +  Y  b^(t) 


& 

b.  (t)  a  tr{/  (K.S.K.M,, +K,S.K.M.  +K,S,K,M  Jdt} 
i  i  i  i  ii  i  3  j  jo  j  j  i  oj 

E{jJ|X^}  a  i  [x^(0)K^(0)x^(0)  +  tr{M^^(0)K^(0)}  +  b^(0)] 


Notice  that  the  optimal  control  gains  are  independent  of  the  filter  matrices 
and  covariances;  but  the  optimal  filter  matrices  and  covariances  depend  on 


the  control  gains  resulting  in  a  "dual  effect"  which  is  optimized  with 
respect  to  the  given  filter  structure  and  the  cost  functionals. 

The  linear  strategy  (3.11a)  is  the  unique  Nash  strategy  for  the 
above  problem.  Nonuniqueness  does  not  arise  because  it  is  not  possible  to 
express  at  time  t,  in  terms  of  the  values  of  x^  from  0  to  t,  due  to  the 
presence  of  white-noise-corrupted  measurements  (3.6)  [27]. 

The  following  assumptions  are  made  throughout. 

Assumption  a;  Ble  X(A^^)  <  0  ;  i*l,2. 

Assumption  b :  The  triple  controllable-observable. 

From  the  solution  obtained  above,  it  is  clear  that  the  optimal 
finite -dimensional  compensators  are  not  Kalman  filters,  and  hence  the  earlier 
results  [30-34]  on  filtering  and  control  of  linear  stochastic  singularly 
perturbed  systems  do  not  apply  here.  A  number  of  Important  questions  now  arise 
What  is  the  limiting  structure  of  these  finite -dimensional  compensators  as 
the  small  parameters  go  to  zero?  Does  the  full  order  compensator  decompose 
into  a  number  of  decoupled  low-order  compensators?  Does  the  resulting  limiting 
structure  offer  any  computational  and/or  Implementational  advantages?  Is  it 
possible  to  obtain  a  near-equilibrium  solution  based  on  the  solution  of  low- 
order  problems  as  in  the  deterministic  case  [15,16]? 

It  is  our  Intention  here  to  answer  the  above  questions <  Specifically 
we  shall  show  that  the  multimodel  solution  is  the  asymptotic  limit  of  the 
exact  solution  as  the  small  parameters  go  to  zero.  To  obtain  the  multimodel 
solution,  we  first  need  to  derive  the  simplified  model  used  by  each  decision 


maker. 
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3.3.  Multlinodel  Generation 

DM1  arrives  at  his  simplified  model  by  neglecting  the  dynamics  of 
the  J-th  fast  subsystem  and  the  weak  Interactions  between  the  two  fast 
subsystems,  l.e.,  by  setting  on  the  left  hand  side  of  (3.1)  and 

£^^^■£22  =  0  in  (3.1).  The  steady  state  dynamics  of  the  j-th  fast  subsystem 
Is  then  given  by  the  algebraic  equation 


lj(c)  -  +  + 


(3.22) 


The  above  expression  for  Zj(t)  has  been  shown  to  be  valid  as  Input  to  slow 
systems  [31].  Therefore,  substituting  (3.22)  In  (3.1),  (3.2)  results  In  the 
following  simplified  model  for  the  1th  decision  maker. 


£(i)  »  +  A  z.^^^  +  B  u  +B  u  +L^^^w; 

0  00  ol  1  js  j  ol  1  0 


z^^ho)~z  .(3.23a) 
o  00 


.(1) 


(i) 


(i) 


(3.23b) 


The  observation  vectors  for  the  two  players  are  given  by 

.(i) 


.(i) 


ol 
0  C 


0 
IIJ 


o 

L*1  J 


+  V, 


(3.24a) 


-  c  z^^^  +  D  u  +  V 
J  js  o  js  j  js’ 


(3.24b) 


Notice  that  In  the  above  simplified  model  used  by  1^11,  the  two  decision 
makers  do  not  Interact  at  the  fast  subsystem  level,  but  Interact  only  at  the 
slow  subsystem  level.  Therefore,  to  obtain  the  multimodel  solution, DM1  needs 
only  to  know  the  parameters  associated  with  the  model  (3.23),  (3.24)  ;  an 


exact  knowledge  of  the  full  model  (3.1),  (3.2)  is  not  required.  The  multi¬ 
model  solution  is  then  obtained  by  solving  three  low-order  problems :  two 
independent  stochastic  control  problems  for  each  decision  maker  at  the  fast 
subsystem  level;  and  a  constrained  stochastic  Nash  game  at  the  slow  subsystem 
level . 


3.4.  Weak  Limit  of  the  Fast  Stochastic  Variable 

Before  we  formulate  the  low-order  problems,  we  would  like  to 
establish  the  "weak"  limit  (limit  in  the  sense  of  distributions)  of  the  fast 
stochastic  variable  which  will  be  shown  to  be  the  valid  limit  for  substitution 
into  the  cost  functionals.  The  formal  white  noise  limit  (3.22)  is  not  valid 
for  substitution  into  the  cost  functionals  since  it  gives  rise  to  some  ill- 
defined  terms  like  the  integral  of  the  variance  of  white  noise  [31]. 

The  following  results  are  needed: 

Lemma  3.1:  Let  f(t)  be  a  function  satisfying  the  following  conditions 

i)  f (t)  >  0  for  all  t 
00 

ii)  /  f(t)dt-l. 

«00 

Then  the  following  distribution  convergence  is  obtained. 


lim  ^  f (t/y)  ■  6(t) . 


Lemma  3.2:  Let  z(t)  ■  — ^ 


L  dw,  where  w  is  a  Wiener  process  with 


u  0 


E{dw(T,)dw(-.)}  ■  W6(T,-T„)dT.  Then,  lim  z(t)*w  "weakly"  for  each  t>0,  where 
i  Z  i  Z 


w  is  a  constant  Gaussian  random  vector  with  mean  zero  and  variance  W  which 


satisfies  the  equation 


o 


AW  +  WA'  +  LWL'  -  0. 


Setting  rewrite  equations  (3.1b)  as 


e.dz.  ■  X.z  dt  + A.  .z  dt  +  B.  .u.dt  + L.dw 
i  i  10  o  11  1  11  1  11 


(3.25) 


where  w  is  a  Wiener  process  such  chat  w*w.  The  integral  representation  of 
equation  (3*25)  can  be  written  as 


At/e  ,  t  A  (t-T)/e 

- «  So 


‘io  o  “ii  i' 


t  A^^(c-T)/e^ 


L^dw(T) 


(3.26) 


A  straightforward  application  of  Lemmas  3.1  and  3.2  yields  the  following 
"weak"  limit 


-  Ite  +»i. 


(3.27) 


where  is  a  constant  Gaussian  random  vector  with  mean  zero  and  variance 


which  satisfies  Che  equation 


hA  *  Vii  +  ^i^^i " 


(3.28) 


3.5.  Slow  Subproblem 


The  slow  subproblem  is  formulated  by  setting 
e^»e2“0  on  the  left  hand  side  of  (3.1).  The  formal  white  noise  limit  given 
by  (3.22)  is  substituted  into  the  state  and  observation  equations  (3.1)  and 
(3.2);  and  the  weak  limit  (3.2)  is  substituted  into  the  cost  functionals 
(3.4).  This  gives 


z  "Az  +  £B.  u,  +Lw;  z  (0)  *  z 

os  s  os  js  Js  os  os  oo 


^is  ^is*os  ^Is'^is  "**  ^is* 


i-  1,2. 


(3.29) 


(3.30) 


Each  player  Is  constrained  to  use  only  an  n-dlmenslonal  compensator  of  the 


^si  ^is*si  ^is ^^ia""^is*si  ^is'^is^  ^is'^is' 


The  expected  values  of  the  cost  functionals  are  given  by  ■ 

1 

E[J.  |Z  J  - E{z'  (T)f  .z  (T)  +  /(z’Q.Z  +22’Q.  u 

Is'  si  2  os  ol  os  ■'  os  ols  os  OS  IS  Is 

o 


(3.31) 


(3.32) 


The  decision  makers  select  the  matrices  F*^,  G*^,  the  Initial  conditions 

z*  (0) ;  and  the  closed- loop  control  laws  uj  (z  .(t),t)  such  that 

SX  IS  si 

S  i«.  (3.33) 

Applying  Theorem  3.1,  the  equilibrium  solution  Is  found  as 


<«lt=i3-'-33'^I’''l3 


H*  - 
Is  Is 


i*. (0)  -  I 


(3.34a) 


(3.34b) 


(3.34c) 


(3.34d) 


(3.34e) 
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where  Is  the  solution  of  the  coupled  Riccatl  equation 

K.  +K.  A  +a’k.  +(Q  .  -Q.  rT^Q!  )-K.  S,  K,  -  K,  S,  K. 
is  is  s  s  is  ois  is  is  is  is  is  is  is  js  js 


-K,  S,  K.  -  0; 
js  js  is 


M(t)  is  a  synunetric  nonnegative  definite  matrix  defined  as. 


M(t)  ■  E{m(t)m'(t)}  ;  i(t) 


os 


z  -z  , 
os  si 

1  z  -z  _ 
L  os  s2. 


satisfying  the  differential  equation. 


M  -  F  M  +  MF'  +  B  @  b’ 
s  s  s  s  s 


with 


M,  .(0)  -  z  z’  +  N  ; 
ij  oo  oo  oo 


N 

oo 


i- j -0. 

else. 


The  expected  value  of  the  optimal  cost  is  obtained  as 

*2  "  ' 


where 


-  tr{/  {[Q,  +K,  B,  ]r7^[QJ  +b!  K,  ]M,,+K,  B,  r7^(b'  K, 
;  is  is  is''  is  is  is  is'  il  is  js  js  js  js 


+  QL)M,„+(Q4  +K4  B.  )r7^b'  K,  M,}dT}. 
js  jo  '^js  js  js  js  js  is  oj 


(3.35) 


(3.36) 


(3.37) 


(3.38) 


(3.39) 


The  fast  subproblems  are  "local"  problems  for  each  decision  maker 
These  are  stochastic  control  problems  because  the  decision  makers  do  not 
Interact  at  the  fast  subsystem  level.  Assuming  that  the  slow  variables  are 
constant  during  the  fast  transients,  we  obtain 


®l*lf  "  ^ll*lf  ®ll“lf  ^l'' 

''ll 

■^if "  I  <®i*i((^>^i*ii(w +/  “ 


The  optimal  u*^  minimizing  is  obtained 

principle,  so  that 

where  satisfies  the  Riccati  equation 

®i^if  "  ■*^if^ii"''^ii*^if  ■^i''’^if^ii*^if  ’ 


(3.4 

(3.4 

ifEiUif)dt},  (3.4 

by  applying  the  separation 

(3.4 

Kif(T)  -  f^;  (3.4 


z . .  Is  the  output  of  the  Kalman  filter  given  by 


I  f: 


i  0 


Vif  “  ^  ’  ^if  “  ^io 


^^ii"®ii^if  "^if '^ii^  ^if  ''■  ^if  ^ii^ii  '•■  ^ii^ii 


*-  *.  • 

K  ■•■- 

S  “ 


is  the  error  covariance  of  satisfying 


(3.45) 


SiPif  -  PifAii  +  A^iPif  +  LiWL^-PifTiiPif  ;  Pif(0)-N^^.  (3.46) 

Under  the  Assumptions  (a)  and  (b) ,  the  limiting  behavior  of  z^£» 
and  as  has  been  considered  in  [31]  and  is  summarized  below  : 


“If  ■  =:f^«<<  > 


(3.47a) 


'if  ■  'lf*'’<'i  ^ 


(3.47b) 


Kif  .  K^,  +  0(e^) 


(3.47c) 


P,f  -  ^f  +  0(e,) 


(3.47d) 


where  P^^,  z^^,  and  u*^  satisfy 


(3.48a) 


K- 


k'.  -I 


^if^ii'^^ii^if  "^^i'^^if^ii^^if  *  ° 

“if  ■  “lAf'if 


(3.48b) 


(3.48c) 


(3.48d) 


,1 


The  expected  value  of  the  optimal  cost  is  given  by 
T  £ 


*T  'io‘=lf«»no- 


(3.49) 


In  the  limit  as  this  reduces  to 


E(JJ,)  .Ttr{[iQ^  +  K^,P^,T^^]P^,}. 


(3.50] 


The  approximations  obtained  from  equations  (3.47)  and  (3.48)  are  valid  only 
on  a  subinterval  (tj^,t2]  C  (0,T)  because  the  "boundary -layer"  terms  have  been 
neglected. 


7.  Limitina  Behavior  of  the  Optimal  Solution 


The  multimodel  strategy  pair  used  by  the  decision  makers  is  given 


u*  =  u*  +u*  -  -R7^[b!  K.  +Q'  Jz  -  rT^B'  K.,f, i»l,2 

im  is  if  is^  is  is  ^is'  si  i  ii  if  if 


where  z  .  and  z.,  are  the  states  of  the  n  -  and  n. -dimensional  compensators 
si  if  o  i 

given  by  (3.31)  and  (3.48c),  respectively. 

We  shall  now  examine  the  limiting  behavior  of  the  exact  solution 
(3.11)-(3.14) .  For  the  sake  of  brevity,  the  detailed  manipulations  involved 
in  taking  the  limit  of  matrix  equations  as  llc^|->-0  are  omitted. 
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Let  the  solution  of  (3.12),  K^,be  of  the  form 


Kgo^(e) 


£2^22^^) 


;  i-1,2. 

(3.52) 


Substituting  this  in  (3,12)  and  taking  the  limit  as  lle8->0,  it  can  be  shown 
that 


where 


kJo^(0  -K^3  +  0(i|dl) 

K^J^(£)  -Kj,gE^-Ej^  +  0(||e||) 
K^J^CO-K^gEj+Odlell) 

Kij'^e).K^^  +  0(|^ll) 

K^j^e)  -O(ileii) 

Kj^j^e)  -O(llell) 


^i  “  ^^Oi^if  ■‘^Ol^^^ii  "  ^ii^if^ 
^i  "^iO^if  ^\i  ”  ®ii^i£^ 


(3.53) 


),  M(t),be 

of  the 

form 

"oo(^> 

Mj^j^(£) 

M22(0” 

Mii(e) 

Mli(e) 

Mi2(0 

(3.54a) 

M22 (e) 

Ml2(0 

M22CO 

where  each  block  is  of  the  form 


v^M°^(e)~ 
2  OO'- 

12' 

m12(c) 

12' 

“ii  <=> 

hIIm  . 

- 

»llu) 

MjJCc) 

M^jCe) 

Substituting  (3.54)  in  (3.14)  and  taking  the  limit  as  it  can  be 

shown  that, 

Mjj(e)  «  M^^  +  0(jle!l) 
mJJ(£)  -  +  0(|ieil) 

M^J(e)  -  O(llell) 

-MQQ  +  0(llell) 

MjQ(e)  -  Odicll) 

M°2(0  -Mi2  + 0(11^11) 

mJJcO  -  O(lleii) 


.54b) 


-  0(11  eih  . 


(3.55a) 
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The  form  of  the  compensator  equations  (3.58)  ts  identical  to  the  form  of  the 
state  equations  (3.1).  This  form  permits  easier  manipulations  to  obtain 
their  limiting  behavior. 

Now,  we  transform  the  equations  (3.58),  in  order  to  separate  its 
slow  and  fast  components.  The  transformation  and  its  inverse  are: 


'’o 

Iq^^^K^T^“V  E2N2T2 

^0 

ai 

h  0 

*1 

I 

<>< 

M 

0 

_ 1 

3. 

1 

0 

5^ 

1q  v^j^N^  V^2^2 

"^o" 

m 

-Tj_  V^Tj_N2 

3^ 

-T2  ■^'”l‘^2®’l  ^2  "^^^2^2 

/2_ 

■/here, 


(3.59a) 


(3.59b) 


(3.60a) 


+'/e.A„,  -GnC,  +  (3. 


60b) 


-*11^1  -  *10  -  '  A(*00  -*0lh -\2h'>  * 

m  wm  mm  ^m 

1^01  '  S*"i  ^i^^OO  "  ^oi 

Transforming  (3.58)  using  (3.59)  and  (3.60)  results  in; 

+Vrj^NiTiAoiI  -  [N^C  V2‘^Vo^2^^^^  ^1^lV2  "7^  V2 

+v'T2N2T2Aq2]  n2  +  [(Iq  '  ^2^  2^^"  (3.61a) 

^1^1  "  f  "1^1^01  ■^-Sl^ \  ^^^1^02+^11^12^  ’'2  +  f  (3.61b) 

^2  n2  -  Ce2T2AoL+e2/2lJn  1  +  [^2^02  +A22ln  2  +  ' 


(3.61c) 
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It  can  be  shown  that  the  limiting  solution  of  (3.61)  is 


+  0(11  eli^) 


n<^>J 


n(l> 

2 


+  0(||e||‘^') 


+  0(11 


(3.62a) 


(3.62b) 


(3.62c) 


where 


(3.63) 


The  limiting  solution  of  tIq  is  just  the  compensator  of  the  slow  subproblem; 
and  the  limiting  solution  of  one  component  of  is  the  Kalman  filter  of 
the  fast  subproblem.  The  other  component  of  which  is  the  estimate  of 
the  ith  fast  state  by  the  jth  decision  maker,  tends  to  a  filter  based  on 
the  a-priori  information  which  is  all  the  jth  decision  maker  knows  about  the 
ith  fast  subsystem.  This  estimate  is  of  no  use  to  him  since  his  near-optimal 
strategy  given  below  does  not  need  this  information. 

The  equilibrium  strategies  are  approximated  as. 


i=  1,2. 


(3.64) 


The  optimal  expected  values  of  the  performance  indices  are  approximated  as, 


E{jJ|X^}  -  |[i^(0)K^(0)i^(0)  +  tr{M^^(0)K^(0)}  +  b^(0)] 

+  |[ri(0)Kig(0)Zsi(0)  +  tr{Mii(0)Kig(0)}  +  b^g(O)]  +  0(1  el) 


E{jJs|Zsi}  +  E{jJ^}  +  O(lel);  i.- 1,2. 


(3.65 


Equations  (3.64)  and  (3.65)  are  obtained  by  substituting  the  limiting  values 
of  and  M.  To  get  (3.64),  also  had  to  be  transformed  using  a  transfor¬ 
mation  similar  to  (3.59). 

The  multimodel  nature  of  the  problem  is  apparent  from  the  form  of 
the  near-optimal  strategies  (3.64),  which  suggests  that  the  ith  decision 
maker  needs  only  to  model  the  dynamics  of  his  own  fast  subsystem  and  the 
common  slow  subsystem. 

The  structure  of  the  near-optimal  scheme  is  similar  to  that  of  the 
deterministic  problem  treated  in  [15,16],  in  the  sense  that  the  fast  sub¬ 
problems  are  control  problems  different  for  the  two  decision  makers  and  the 
slow  game  problem  is  common  to  both  the  decision  makers.  This  is  essentially 
due  to  the  fact  that  in  both  cases  the  fast  subsystems  are  weakly-coupled  and 
are  controlled  by  a  single  decision  maker.  In  situations  when  this  is  not 
true,  the  near-optimal  solution  will  be  quite  different  as  has  been  demon¬ 
strated  for  deterministic  problems  in  Chapter  2. 

The  overall  near-optimal  filtering-control  scheme  is  depicted  in 
Fig.  3.1.  The  hierarchical  nature  of  the  filter  Implementation,  wherein  the 
estimate  of  the  slow  filter  is  one  of  the  driving  Inputs  to  the  fast  filter, 
can  be  seen  from  the  figure.  This  arises  naturally  due  to  the  fact  that 


the  innovations  process  driving  the  fast  filter  needs  the  "fast"  output  which 
Is  generated  from  the  actual  output  by  subtracting  out  Its  "slow"  part 
formed  from  the  slow  estimate.  This  fact  has  been  pointed  out  in  [34]  for 
the  single  parameter  control  problem. 

3.8.  Conclusions 

A  decentralized  filtering  and  control  scheme  has  been  presented 
for  two  decision  makers  controlling  a  large  scale  system.  It  Is  shown  that 
in  order  to  obtain  near-equl librium  Nash  strategies,  the  decision  makers 
need  only  solve  two  decoupled  low-order  problems:  a  stochastic  control 
problem  in  the  fast  time-scale  at  their  "local"  level,  and  a  joint  slow  game 
problem  with  finite-dimensional  state  estimators.  This  leads  directly  to  a 
multimodel  situation  wherein  each  decision  maker  needs  to  model  only  his 
local  dynamics  and  some  aggregate  dynamics  of  the  rest  of  the  system.  The 
advantages  of  using  the  proposed  scheme  are  apparent.  The  decoupling  of 
solutions  at  the  subsystem  level  would  result  in  considerable  computational 
saving.  Also  since  the  near-optimal  strategies  need  only  decentralized 
"state  estimates,"  each . decision  maker  needs  to  construct  only  two  filters 
of  dimensions  n^  and  n^^,  respectively,  instead  of  constructing  one  filter  of 
dimension  nQ  +  nj^  +  n2  as  required  by  the  equilibrium  solution.  This  would 
result  In  lower  implementation  costs. 

It  Is  to  be  noted  that  the  problem  addressed  In  this  chapter  is 
quite  different  from  the  earlier  problems  on  filtering  and  control  of 
stochastic  singularly  perturbed  systems.  The  earlier  work  focused  on 
appropriately  characterizing  the  limiting  behavior  of  the  fast  variable  in 


50 


the  presence  of  white  noise  to  obtain  well-posed  lower  order  problems.  The 
high-order  optimal  singularly  perturbed  Kalman  filter  was  shown  to  decompose 
Into  two  low-order  Kalman  filters  In  the  slow  and  fast  time-scales  In  the 
limit  as  e->-0.  The  problem  with  multiple  decision  makers  possessing 
differing  observations  under  a  multimodel  situation  has  been  addressed  here 
for  the  first  time.  Since  the  estimators  for  this  problem  are  not  Kalman 
filters,  the  earlier  results  could  not  be  applied  here.  Therefore  we  had  to 
examine  the  limiting  behavior  of  the  particular  estimator  structure  adopted 
for  the  optimal  solution.  The  result  shows  that  in  the  slow  time-scale  the 
estimator  retains  the  same  structure  as  the  optimal,  but  In  the  fast  tlme- 
scale  It  turns  out  to  be  a  Kalman  filter.  Furthermore,  we  have  established 
the  "weak"  convergence  of  the  fast  variable  which  is  shown  to  be  the  valid 
limit  for  substitution  in  the  cost  functionals;  a  fact  which  had  not  been 


established  so  far. 


CHAPTER  4 


A  MULTIMODEL  APPROACH  TO  STOCHASTIC  TEAM  PROBLEMS 

4.1.  Introduction 

In  this  chapter  we  continue  to  study  the  role  of  time-scales  in 
multimodeling  of  stochastic  linear  systems.  We  shall  demonstrate  the  well- 
posedness  of  multimodel  generation  by  "k“th  parameter  perturbation"  for  both 
static  and  dynamic  team  problems  under  certain  quasl-classlcal  Information 
structures.  The  weak-coupllng  assumption  on  the  fast  subsystems  Is  retained. 

In  Section  4.2,  the  general  dynamic  team  problem  with  sampled 
observations  and  quasl-classlcal  Information  pattern  Is  formulated.  In 
Section  4.3,  a  multimodel  solution  is  obtained  for  the  static  team  problem. 
Then,  in  Section  4.4,  the  solution  of  the  static  team  problem  is  utilized  to 
obtain  a  multimodel  solution  to  the  dynamic  team  problem  under  the  one-step- 
delay  observation-sharing  pattern.  In  both  cases,  the  multimodel  solution  is 
shown  to  be  well-posed;  in  the  sense  that  It  Is  the  asymptotic  limit  of  the 
optimal  solution  as  the  small  parameters  go  to  zero.  The  chapter  concludes 
with  Section  4.5. 


4.2.  Problem  Formulation 

The  system  under  consideration  consists  of  strongly-coupled  slow 
core  and  weakly-coupled  fast  subsystems  controlled  by  two  decision  makers. 
It  is  modeled  by  the  Ito  differential  equations 
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^^0  "  <AqqZq  “  ^00 

EidZi  -  (A^qZq  +  +  ej^iAj^j^Zj^  +  \i\)  dt  +  VeiFj^i<iw^;  “  \o 

t  i  tQ?  1,  k-1,2;  ijtk  (4.1) 

where  dim  Zq  ■  Uq,  dim  Z^  ■  n^,  and  {uj^(t);  t  ^  are  m^-dimensional 
stochastic  processes  denoting  the  controls  of  DM^.  [w^(t);  t  2  t^;  i«lj  2} 
are  standard  Wiener  processes  independent  of  each  other.  The  small  singular 
perturbation  parameters  >  0  represent  small  time-constants,  inertias, 
masses,  etc.;  while  the  small  regular  perturbation  parameters  represent 

weak-coupling  between  the  subsystems.  The  states  1*1,2}  are  fast  since 

their  derivatives  are  of  order  I/e^.  The  matrices  i*i>2)  are  assumed 

to  be  nonsingular. 

The  initial  conditions  are  assumed  to  have  Gaussian  statistics  with 
known  parameters  which  will  be  specified  later.  The  decision  makers  make 
Independent  decentralized  sampled  measurements.  Specifically,  it  is  assumed 
that  a  p ^"dimensional  observation 

is  available  to  DI^  at  the  saddled  time  instant  t^  where  J  *  0,1,  ...  N-1 

and  t^  <  t^  ^  ...  <  ^  *  ^f*  index  set  of  time  samples  by 

9  ■  Co,l,  ...,  N-l}.  Then  the  random  vectors  [v^(j),  j  €9,  1*1,2}  are 
assumed  to  have  independent  Gaussian  statistics  {v^(j)  ~  N(0,R^j),  >0, 

J  €  d,  i  ■  1,2},  and  are  independent  of  the  process  noise  w^(t)  and  the 


initial  conditions. 


To  exhibit  the  slow  and  fast  variables  explicitly,  we  use  the 


following  transformation; 


'V 

'"0-Wl-^2“2«2  -^«1  •^2«2' 

'zq' 

’ll 

- 

Ni  0 

^1 

(4.3a) 

."a. 

0 

3. 

which  has  an  explicit  inverse 


'V 

‘  ^0 

^«1  "2«2  ■ 

3' 

2l 

- 

-Ni 

.22. 

-3 

-h^2^  l2-2«2''2. 

1 

ro 

where  i“l,2}  satisfy 

AiiNi  -  A^q  -  ^ii^ik\  "  ° 

MiCA^i  +  s^N^Aq^)  -  Aq^  +  Wk'^0i”^i(A00-^)i*'i-^k\>”i  +  ^kk^k'Hci  "  ° 

i,k  -  1,2;  l>*k  .  (4.4) 


The  existence  of  solutions  to  (4.4)  is  guaranteed  by  the  assumption 
that  1*1,2)  are  nonsingular  [15]. 

The  transformed  system  and  the  observations  of  each  DM  can  now  be 

written  down  as, 


(Ao(e)Tlo 


2 

+  Z 

J- 


A  2 

J  \  ‘‘"ji 


+'/^i(Fii(e)dWj 


+  F^j^(e)dWk);  n,(tQ)  - 

^  "  ^10  j ^  ■*"  ^ii >^i  j  >  +  c^k >^k ^  +  ''i<J ) 


where 


t  ^  tQ»  “  1»2;  i  }^  k;  j  S  9 


*0^^^  "  ^0*^l**l*^2**2 


Aj(£)  -Ait+^AAoj 


AjijCe)  •  S4''iAp^+ fi^Ajij 


Bot(0 


■  »ot-'^“u-^Woi'=kWoi 


Bn<0 


"ii+'A”oi 


^„.U)  - 


"iVok 


Cjn(e) 


^iO-^il^i 


<=«<'>  ■  =it-'i«tiWH'=io«t 


=tk<'> 


'k‘^to“k''k°ll’'A 


BqjCe)  “  ^Ol''^Vll"'lVl^Ot''kWoi 
Fii(0  -  FuVhVoI 


F^k<^)  ■'/nVok  :  1.  k-1.2;  1  A  k. 


Notice  that  In  this  representation  the  slow  and  fast  dynamics  are  completely 


decoupled  and  further  as  He  11-0  2^11^22^  ^  system  matrix  of  (4.5) 

becomes  block- diagonal.  Without  loss  of  generality,  we  shall  be  working  with 
the  representation  (4.5)  Instead  of  (4.1)  and  (4.2). 

With  respect  to  the  representation  (4.5),  assume  that  the  statistics 
of  the  initial  state  vector  are  given  by 


1  01 


00 

^1^01  ^11 
'^2^02 


2  02 

^^2  ^12 


■“22 


(4.7) 

The  reason  for  assuming  the  particular  form  of  the  covariance  matrix 
Eq  in  (4.7)  is  because,  together  with  (4.5a),  we  get  cov(riQ,n^)  =  0(/^)  and 
cov  Cl ,712)  "  0C\/e^2)  ^  ^  *^0*  drop  the  small  parameters  from 

then  the  above  covariance  relations  will  hold  only  for  t  >  t^  outside 
some  boundary -layer.  The  results  obtained  in  the  sequel  would  still  be  true 
since  the  contribution  of  the  boundary- layer  terms  is  only  0(||e|l).  Assuming 
the  particular  form  in  (4.7)  simplifies  the  algebra. 

We  now  adopt  a  quasi-classical  information  pattern  for  this 
decision  problem,  and  follow  the  formulation  of  [36].  Specifically,  it  is 
assumed  that  the  DMs  exchange  their  independent  sampled  observations  with 
a  delay  of  one  sampling  interval.  Such  an  information  pattern  is  known  as 
the  one-step-delay  observation-sharing  pattern  [37].  Hence,  the  information 
available  to  DM^  in  the  time  interval 


is  Q'J  where  -  Cy^(j),C4.1} 


(4.8a) 


and  1  denotes  the  common  Information  available  to  the  decision  makers 

bj-i 


in  the  same  sampling  interval,  i.e.; 


Cj.i  -  {yia-i),y2a-i),  •••.  yi(0).y2(0)}- 


(4.8b) 


Let  denote  the  sigma-algebra  generated  by  the  information  set  a^. 
Further,  let  ^  denote  the  class  of  second-order  stochastic  processes 
{u^(t),  t  3s  t^}  which  satisfy  the  requirement  that  their  restriction  to  the 
interval  [tj,tj_|_j^)  is  -measurable,  for  all  j  e  9.  Then  a  permissible 
strategy  for  is  a  mapping  v^:  ttQ,tjl  x  -•  ]R°^,  such  that 

Vi(.,c^l)  €  1^.  Denote  the  class  of  all  such  strategies  for  DM^  by 
It  should  be  noted  that  for  each  pair  of  elements  in  x  the  stochastic 
differential  equation  (4.3a)  admits  a  unique  solution  whose  sample  paths 
are  continuous  [38]. 

N  N 

For  each  ^ define  the  quadratic,  strictly 
convex  cost  function  as 


+^S^(n^Q^rii  + uj^Ui))  dtjUj^Ct)  -  Vj^(t,af^),  1-1,2} 


(4.9) 


where  ^  0;  i-0,l,2},and  the  expectation  operator  is  taken  over  the 

underlying  statistics. 

Then  an  optimal  solution  for  this  dynamic  team  problem  is  a  pair 
{v*€  r^,  i=l,2}  such  that 


inf  inf  J(v,  ,v  )  -  j(v  ,v  ) 
r  N  r  N  ^  ^  ^ 

1  2 


(4.10) 


V 


Defining  x' =  w' *  [w^w^],  equations  (4.5)  and  (4.9) 

can  be  written  in  a  composite  form  as 

2 

dx  ■  (Ax  +  j5j^BjUj)dt  +  Fdw;  x(^q)  “  Xq 

y^Cj)  -  x(tj)  +  Vi(J);  i-1,2;  j  e  e 

t- 

J(Vl,V2)  -  E{x’(t£)Q^(t^)  +1  (x'Qx  +  u[u^  +  u^U2)dt|u^(t) 

*^0 

i-1,2} 


where 


-Ao(^) 

0 

0 

0 

1 

,  F  = 

-;==F  (e) 

✓  Ej^  11 

ii'i2^^> 

0 

i  A,(c) 

it  '“'"I 

Ci  = 

=  block  diag  [Qof ’£2^2f ^ 

Q  =  block  diag  [Qq»Qj^»Q2^-  (4.13) 


(4.11a) 

(4.11b) 


Vi(t  “i). 


The  following  assumptions  are  made  in  order  to  guarantee  the 
existence  of  a  unique  limit,  as  llel-*-0,  of  the  optimal  solution. 

Assumptions : 

a)  Re  ^(A^^)  <  0;  i=l,2 

b)  ^)  is  controllable-observable;  i=l,2. 

Before  obtaining  the  solution  of  the  dynamic  team  problem  defined  by  (4.10), 
(4.11),  and  (4.12),  we  first  consider  its  static  version  (obtained  by 
setting  N«l)  in  the  next  section. 


59 


4.3.  Static  Team  Problem 

In  the  static  version  of  the  dynamic  team  problem  formulated  in  the 

last  section,  the  decision  makers  make  noisy  linear  observations  of  the 
random  Initial  state,  and  do  not  require  any  further  Information  as  the 
decision  process  proceeds.  Hence*  the  static  version  can  be  recovered 
from  the  general  formulation  by  setting  N*l. 

To  this  end,  let  the  observation  of  DM^  be  given  as 

+v^;  1-1,2  (4.14) 

where  v^  N(0,R^)  and  Xq  N(Xq,Sq),  and  these  random  vectors  are  statis¬ 
tically  Independent. 

An  optimal  solution  for  the  static  team  problem  defined  by  (4.11a), 
(4.14),  and  (4.12)  is  a  pair  i-1,2}  such  that 

inf  Inf  •  (4.15) 

r  1  pi 
1  2 

The  unique  optimal  solution  to  this  problem  is  given  in  [36],  and  can  also 
be  found  in  Appendix  C. 

Due  to  the  presence  of  widely  separated  eigenvalues,  the  differential 
equations  (C2)-(C17)  involved  for  computing  the  optimal  solution  are  numer¬ 
ically  stiff.  This  renders  the  optimal  solution  computationally  infeasible, 
specially  when  the  order  of  the  system  Is  very  large.  Sometimes  It  Is 
even  difficult  to  obtain  the  optimal  solution;  e.g.,  when  the  small  per¬ 
turbation  parameters  are  unknown,  or  when  one  DM  does  not  have  a  knowledge 
of  the  fast  dynamics  of  the  other  DM.  In  such  cases  we  need  to  look  for 
other  suboptlmal  solutions.  The  multimodel  solution  proposed  here  does  not 
require  every  DM  to  have  an  exact  knowledge  of  the  fast  dynamics  of  other  DMs . 


Moreover,  as  we  shall  see  later.  It  is  well-posed  In  the  sense  that  it  tends 


to  the  optimal  solution  in  the  limit  as  the  small  parameters  go  to  zero. 

Before  we  propose  the  multimodel  solution  to  the  static  team  problem, 
we  need  the  following  result  from  Chapter  3. 

Lemma  4.1:  Let  n^Ct)  satisfy  the  Ito  differential  equation 

dt  +  v^F^^^dw^  (4.16) 

where  w^  is  a  standard  Wiener  process,  is  a  known  function  of  time,  and 
ReX(A^^)<0.  Themi^(t)-11j^g  (t)  weakly  as  where 

Tlis(t)  -  u^(t)  +w^  ,  (4.17) 

and  Is  a  constant  zero  mean  Gaussian  random  vector  with  variance 
satisfying  the  Lyapunov  equation 

—  I  f 

Ail  ^ii  "  0’  (4.18) 

The  weak  limit  of  T]^(t)  has  been  shown  to  be  the  appropriate  limit  for 
eliminating  the  variable  Tlj^(t)  from  the  cost  functional  J  to  obtain  the  slow 
cost  (Chapter  3) . 

The  multimodel  solution  is  obtained  by  solving  the  following  low- 
order  problems. 

4.3.1.  Slow  subproblem 

This  is  a  static  team  problem  obtained  by  taking  the  limit  as 
II  el -*■  0  in  the  original  problem  defined  by  (4.11a),  (4.14),  and  (4.12), 

2  2 

■*^0.  ■  'Vos  *  “oi  “l,>  *  .L 
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I  m 


*  =10  ’>00  +  '^1  •  yr=ii  ’'lo'  ^‘’'2 


(4.20) 


^nn  *“  N(0,R.) 


(4.21) 


•^s ‘''ls*'^2s^  "  ®  ^Of  ^Os  *^0  ^Os  “is  ^is  “is^ 


where 


dt|u^^(t)  -v^gCt,®^).  1-1,2}  +  Jq 


Ri.  ■  I  +  wli  Bjj)'  all  "ii) 


(4.22a) 


(4.22b) 


•^0  " 


(4.22c) 


is  the  symmetric  nonnegative  definite  solution  of  the  Lyapunov 
equation  (4.18). 

The  unique  optimal  team  solution  to  the  slow  subproblem  defined 
by  (4. 19)-(4. 22)  is  given  by  Theorem  2  of  [36]: 

“is^*^^  "  ^is  ^^i'^lO^OO’^ii^iO^  "  ^is  ®0i  ®s  (4.23) 

where  S  (t)  is  the  nonnegative  definite  solution  of  the  Riccati  equation 
s 

K  *  V.  +  S^Ao  -  S.  (Ej,  +  Ej,)  S,  +  Q„  -  Oj  S,(tp  -  (*.24a) 


\s<‘)  ■  <V=l3^-®2a®s>  ’>0s''=»  ’'o."=0>  ■’>00 


(4.24b) 


'is  ■  H\  “Ol  'is  '^is-^js’^ls'  -  'li  '0l  •=ls'  ^  J 


(4.24c) 


S.  (t)  is  nonnegative  definite  solution  of  the  Riccati  equation 
Is 


bu: 
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hs  +  ^0  Si3  +  ha  Aq  -  ha  ha  ^is  +  ^0  "  °*  ha^h^  "  ^Of’ 
and 


•is  'V^l.®!.'  h,  *  ®1.  '''l.^l.I'Js^l.’ !  fl.<‘o>  ■  “=  ^  *  1 
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(4.24f) 


(4.24g) 
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(4.24h) 


The  minlnuni  value  of  is  given  by 
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(4.26a) 


(4.26b) 


(4.26c) 


(4.26d) 


(4.26e) 


(4.26f) 


4.3.2.  Fast  subproblems 

These  are  stochastic  control  problems  for  each  DM  (1«1,2),  which 
are  defined  by  the  state  equation 

^i‘*^if  "  ^^ll^if  ®ii  '^if^  ^if(^(p"  ^10  (^-27) 


the  initial  state  measurement 


^if  “  *^il^i0  ■'■  ^i  “  ^i  “  ^iO  ^0( 

Tlio-  N(^^q,  N(0,Rj^) 


(4.28a) 


(4.28b) 


and  the  cost  functional 


^i£^i£^^£^  ‘*‘•1'^  ^i£‘^i£^l£  ■*■ '*i£“i£^^^l 


Vi£(t,ai)}. 


(4.29) 


The  unique  optimal  solution  to  these  static  control  problems  is  given  by 

Ui£(t)  -  ^i£  ^^i~^i0^00*^ii^i0^  "  ®ii  ®i£  ^i£^®^  ‘  (4.30) 

where  S^^Ct)  is  the  nonnegative  de£inite  solution  o£  the  Rlccati  equation 


^i^i£  “"^ii^ii  "  ^i£^il  ■  ^i  ^l£®ii®ii®i£’  ®i£^‘^£^“  ^i£  ^ 


(4.31a) 


'if^^(t,tQ)  is  the  state  transition  matrix  o£  the  system 

Sj^^i£(t)  ■  ^^ii“®il®ii®i£^  "nifC^o^  "  ^iO  *  (4.31b) 


^i£  "  ^ii  ^ii  ^^iO  ^00  ^iO  ^^ii  ^ii  ^ii  ■*■  ^i^ 


The  minimum  value  o£  given  by 

•^ii  "  '^i£^“l£^  "  ^1  ^iO  ^1£^°^  ^iO  '*■^1^'^  ^il®i£^°^^ 

+  tr(/  S,,F^^F;^dt)  +  4 


(4.31c) 


(4.32a) 


where 


4  ■  "  fj”''  ^11  >^1  +  Sif  =11  4  Sif  WiXit} 


(4.32b) 
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rt 


with 


-  2a>ii  (t.tg)  -  Bii  (t.to)  Cji)  (4.33a) 


■  -»ii  *1  =lf 
*t  ('-*0^  ■  hi  *i  <'’V'  *1  <*0’V  * 


(4.33b) 

(4.33c) 


Under  Assumption  b,  as  e^-^0,  where  is  the  unique 

positive  definite  solution  of  the  algebraic  Riccatl  equation 


^li  ^if  ■**  ®if  ^11  ‘  ®lf  ®11  ®ii 

Also,  J*j  -*  jJj  as  -  0,  where 

hi  hi  ^ 


'mf 


(4.34) 


(4.35) 


and  is  with  repUced  by 


The  optimal  control  u*j(t)  tends  to  ^^fCt)  as  e^-t'O,  where 
uJ^j(t)  is  u*,.(t)  with  replaced  by 


'if 


if 


The  multimodel  strategy  pair  {u^j|j(t);  1*1,2}  is  formed  by  combining 


the  optimal  strategies  of  the  slow  and  fast  subproblems. 


i  I' 


■  "la<'>  *  “tt"^ 

*  '  ®il  ^00  ■  =11  ^to' 


A  _ 


*  ^is  ®0i  ®s  ^Os^^^  ■  ®ii  ^If  ^if^^^ 


(4.36) 


l: 


The  following  proposition  now  establishes  the  well-posedness  of 


the  multimodel  solution. 

Proposition  4.1: 

a)  u*(t)  -  u^^(t)  +  O(lel);  1*1,2;  te  [tj^,t2] C  [tg.t^] 

b)  J*  -  J*  +  +  0(ll£||). 

Proof;  See  Appendix  C. 


4.4.  Dynamic  Team  Problem 

We  now  obtain  the  solution  of  the  dynamic  team  problem  formulated  in 

Section  4.2.  This  Is  done  by  first  enlarging  the  strategy  spaces  of  the  OMS 
so  as  to  formulate  a  new  team  problem  whose  optimal  solution  can  be  obtained 
more  readily.  The  solution  of  the  original  problem  is  then  obtained  from  the 
solution  to  the  new  problem. 

The  new  team  problem  differs  from  the  old  one  in  the  information 
patterns  of  the  DMs.  Specifically,  the  new  one  is  defined  by  replacing 
and  given  by  (4.8),  by  a|  and  respectively ,  where 


aj  -  Cyi(j), 

C^-1  •  CC4-1;  t  <  t  } 


(4.37a) 

(4.37b) 


I 


h- 
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Under  this  nev  Information  pattern,  the  DMs  also  have  access  to  each  other’s 
control  values  used  during  all  past  sampling  Intervals.  This  Information 
pattern,  though  not  of  much  practical  Importance,  Is  mathematically  con¬ 
venient  for  obtaining  the  solution  to  the  original  problem  due  to  the 

following  fact  [36]: 


min  min  J(v,,v-)  "min  min 
pNpN  ^  ^  y  m  JTS  ^  ^ 


(4.38) 


n 


where  and  are  defined  analogous  to  and  respectively, 
but  under  the  new  Information  pattern. 

For  each  ^2^*  V2Sr2},the  Implicit  equations 

(of^)  “  u^(;u)i  1*1,2;  J  ■  N*l,  ...^0. 


(4.39) 


»  ■ 


can  be  solved  recursively  for  («*>),  j  ■  N-1,  •..,0*  as  functions  of 

Ca^,  j^N'l,  *..,0;  1*1,2}  because  of  the  nature  of  the  Information  pattern. 
Then  the  resulting  functional  relations  provide  a  pair  In  I  x  ^  2  >  ^ 

unique  one  since  the  stochastic  differential  equation  (4.5a)  admits  a  unique 
solution  In  each  sampling  Interval.  In  fact,  there  exist  uncountably  many 
pairs  In  x  *^2*^  corresponding  to  a  given  pair  in  x  equivalently, 

a  pair  of  strategies  under  the  original  Information  structure  has  several 
representations  under  the  new  (enlarged)  Information  pattern  [10].  In 
[36]  one  such  representation  in  is  first  obtained  which  is  the  simplest 

to  derive.  Then  implicit  equations  of  the  type  (4.39)  are  solved  to  obtain 
the  desired  optimal  team  solution. 


f  ’ 


The  optimal  solution  to  the  dynamic  team  problem  Involves 
solving  an  appropriate  static  team  problem  with  respect  to  the 


current  outputs,  within  each  sampling  Interval.  The  shared  Information 
affects  the  statistics  of  the  Initial  state  at  the  beginning  of  each  sampling 
Interval.  The  computational  problem  worsens,  since  now  we  need  to  solve 
a  set  of  stiff  differential  equations  In  every  sample  Interval.  Hence,  a 
suboptimal  solution  without  such  numerical  stiffness  will  be  much  more 
desirable  In  the  dynamic  case. 

The  multimodel  solution,  which  is  one  such  suboptimal  solution,  is 
obtained  by  solving  the  following  low-order  problems. 


4.4.1.  Slow  subproblem 

This  is  a  dynamic  team  problem  obtained  by  taking  the  limit  as 
1  el -► 0  in  the  original  problem  defined  by  (4.11)-(4.13) . 

I 

The  state  equation  for  this  problem  is  given  by  (4.19),  the  cost 
criterion  by  (4.22),  and  observations  by 


^Is^-^^  "  ^iO^Os^^J^  2  yj^(j)  - 

1-1,2;  j  e  0  .  (4.40) 

The  optimal  solution  to  the  slow  dynamic  teeua  problem  under  the  new 
information  structure  (4.37)  is  given  by  [36], 

j€e.(4.41) 
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The  unique  optimal  solution  under  the  one -step-delay  observation  sharing 
pattern  Is  given  by 


^  A  A  A^  A^  A^ 

^ls^^’“l^  “  ^^l^^^"^10^0a^^^"^ll^ls^^J^^"^ls  ®01  ^3  ^Os^’^’^j^^Os^^^ 


1-1,2;  t  e  (tj.tj^^);  J  e  9. 


(4.47) 


A^  A^ 

where  ^Qg(t)»  solutions  of  (4.44)  with  u^^(t)  replaced  by 


!«»> 


These  are  stochastic  control  problems  with  sampled  observations 
£or  each  DM  (1*1,2).  Each  one  Is  defined  by  the  state  equation  (4.27),  the 
cost  criterion  (4.29)  and  the  observations 


yif(J)  ■  ^iiHifCtj)  +  Vj^(j)  =  yj^(j)  -  C^oHos^^j^’^ll^ls^^^j^  i 


j  e  e  . 


(4.48) 


The  unique  optimal  solution  to  these  control  problems  Is  given  by 


A  Kic 


Uij(t)  .  Yif(t,tj)  iTl^jCtj)  +K£a)[yi(J)-C,oTljj^(tj)-Ci^Tl^^(tj) 


(4.49) 


where  is  the. unique  positive  definite  solution  of  (4.34),  and 
®i  ^if  ■  ^^il"®ii®ii^if^  Yj^^(t,tj);  te  [tj,tj_|_j^);  (Cj,tj)-I 


(4.50) 


A* 

^i\f 


Air 

®li'*if’  ^  ^  ^  ...jN  j 


^if  ^^0^  *  \o  ' 


A  A* 


\f(tj)  -TlifCtj)  +K^(j)  tyia)-Cio'Jos(*^j>-‘'ii^Is(‘=j^Cii"Jf(’^j>]  * } 


,  A  _  A 


f(J)  ■  ^^icfoo^io  ■*■  ^ii^ii^ii  “*■  ^ij^  *  » 

A  A 

■"1  ®ii  '=i(?oo<'J>'io  +®iiVti  *®iji*^*  •••'»' 


Remarks ;  Notice  the  sequential  nature  of  the  slow  and  fast  sub- 
problems.  The  parameters  associated  with  the  solution  of  the  slow  sub- 
problem,  namely,  solution  of  the 

fast  subproblems.  This  is  in  contrast  to  the  static  problem  of  the  pre¬ 
vious  section  where  the  slow  and  fast  subproblems  were  independent.  This 
interesting  feature,  which  is  due  to  the  dynamic  nature  of  the  problem, 
has  been  noticed  elsewhere  [34]. 

The  multimodel  strategy  pair  for  the  dynamic  team  problem 
[v^^(t,a^);  i=l,2]  is  formed  by  combining  the  optimal  strategies  of  the 
slow  and  fast  subproblems 

''im^*=*“i^  "''is^^*®i^  i«l,2  .  (4.52) 


The  following  proposition  now  establishes  the  well-posedness  of  this 
multimodel  solution. 


Proposition  4.2 


a)  Vi  (t,a^)  +0(lle||);  1-1,2  ; 


*  * 


*  * 


b)  "  '^s^''ls’''2s^  ^  '^If  +0(11^1!) 


1-1 


Proof ;  If  we  let 


i  t 


r  t= 


■ 


t  - 


f"’ 

U- 


where 
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X  ■  Ax  +  +  B2V2(t»“2^  ’ 

A  _ 

*(^q)  "  *0’  •••*N 

i(tj)  -  i(tj)  +Ka)  [y(j)-cJ(tj)]  . 

Z(t)  -AS+Za'+FF';  t€  , 

2:(to)  *  2q;  j-1,  ...,N 

Z(tj)  -  Z(tj)  -  K(j)  C£(tj)  . 


-1 


K(j)  -zctj)  c'Ecsctj)  C  +R^] 


Rj  -  dUg  (Rij,R2j^ 


y(j)  "  [yi(J),  y^CJ)]';  j-i.2,  ...,n 


C  -  ' 


then  It  is  straightforward  to  notice  that 


x*(t)  -  -A"J  B,^  v*^  (t,Qf^)  +  +  0(iiej!);  i.1,2  . 


c  e  c  j  €  e 


J 
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The  rest  of  the  proof  is  analogous  to  the  proof  of  Proposition  4.1  and  is 
therefore  omitted  here. 

The  approximation  in  (a)  is  valid  only  on  subintervals  of  the 
sampling  Interval  due  to  the  fact  that  we  have  neglected  the  boundary-layer 
terms. 


4.5.  Conclusions 

We  have  obtained  multimodel  solutions  to  LQG  team  problems  under 
the  static  information  structure  and  also  dynamic  information  structure  with 
one-step-delay  observation-sharing  pattern.  In  both  cases  the  multimodel 
solution  is  shown  to  provide  an  arbitrarily  close  approximation  to  the 
optimal  solution. 

The  advantages  of  using  the  multimodel  solution  are  apparent. 
Instead  of  solving  one  large-dimensioned  team  problem  which  is  numerically 
ill-conditioned,  the  DMs  need  only  solve  one  low-order  team  problem,  which 
does  not  depend  on  the  small  uncertain  parameters,  and  two  low-order  control 
problems.  These  control  problems  can  be  solved  independently  by  each  DM. 
Hence,  each  DM  need  not  know  the  parameters  associated  with  the  low-order 
control  problem  of  the  other  DM.  This  implies  that  the  multimodel  solution 
is  robust  with  respect  to  modeling  errors  on  the  part  of  each  DM;  a  very 
desirable  feature  in  large  scale  system  design. 

The  results  of  this  chapter  again  demonstrate  the  richness  in  the 
modeling  structure  with  multiparameter  singular  perturbations  in  the  context 
of  multimodeling  problems.  The  limit  of  seemingly  complex  integro- 
differential  equations  associated  with  the  optimal  solution  has  a  nice 


appealing  structure  when  rearranged  and  Interpreted  as  a  multimodel 
solution. 

Here  we  have  assumed  that  the  sampling  period  Is  fixed  and 
predetermined.  If  we  make  the  sampling  period  T  a  function  of  the  small 
parameters,  such  that  T(e)-»-0  as  ||eB-*-0,  then  we  would  not  have  been  able  to 
preserve  the  one-step-delay  observation-sharing  pattern  In  the  limit,  because 
the  observations  become  continuous.  One  way  to  get  around  this  difficulty 
would  be  to  make  separate  observations  of  the  slow  and  fast  variables  and 
let  the  sampling  period  of  the  fast  observations  be  a  function  of  c. 
Apparently,  this  should  cause  no  problems  In  the  asymptotic  analysis  because 
the  fast  subproblems  would  become  continuous  stochastic  control  problems  In 
the  limit  as  lei  -►O.  But  it  is  not  clear  whether  the  slow  dynamic  team 
problem  would  require  the  sharing  of  the  sampled  slow  observations  alone. 

Of  course,  in  such  a  case,  one  will  first  have  to  formulate  appropriately 
and  solve  the  dynamic  team  problem  with  multirate  sampled  observations. 

From  practical  considerations,  our  approach  here  should  cause  no 
problems  because  the  small  uncertain  parameters  are  nonzero.  This  means  that 
in  practice  the  fast  variables  are  not  infinitely  fast  but  have  a  finite 
bandwidth,  and  one  can  always  choose  an  appropriate  sampling  period  from 
physical  considerations. 


CHAPTER  5 


MHLTIMODELING  IN  MARKOVIAN  DECISION  PROBLEMS 

5.1.  Introduction 

In  the  previous  chapters  we  have  examined  multimodel  solutions  for 
two-tlme-scale  systems  modeled  using  multiparameter  singular  perturbations. 
In  [43,44],  the  theory  of  time-scale  decomposition  has  been  extended  to 
probabilistic  Markov  chain  models  where  'slow*  and  'fast*  elgenmodes 
correspond  to  'weak'  and  'strong'  transition  probabilities.  This  chapter 
focuses  on  obtaining  near-optimal  policies  for  controlled  Markov  models 
consisting  of  N  weakly-coupled  groups  of  strongly-interacting  states.  A 
hierarchical  algorithm,  which  allows  for  multimodeling  on  the  part  of  the 
decision  makers.  Is  proposed  for  computing  these  near-optimal  policies. 

The  existing  results  on  Markov  games  [65]  do  not  provide  us  with  a 
sufficient  background  to  address  the  multimodeling  problem  directly.  For 
this  reason,  we  begin  by  formulating  the  general  N-person  average-cost-per- 
stage  problem  with  state  Information  In  Section  5.2.  In  Section  5.3,  the 
optimality  conditions  for  stationary  Feedback  Nash  and  Stackelberg  policies 
are  derived.  The  computational  difficulties  associated  with  the  feedback 
policies  are  discussed  In  Section  5.4.  In  Section  5.5  we  consider 
Stackelberg  problems  when  the  leader.  In  addition  to  the  current  state  of 
the  process,  also  has  access  to  the  followers'  controls  at  every  stage 
[48-50] .  An  efficient  computational  algorithm  Is  proposed  for  computing 
Incentive  policies  which  help  the  leader  to  achieve  his  global  optimum. 


In  Section  5.6,  we  propose  a  hierarchical  algorithm  for  computing  near- 
optimal  incentive  policies  for  weakly-coupled  Markov  chains,  which  allow 
the  'local'  decision  makers  to  use  multiple  simplified  models.  Section  5.7 
illustrates  the  well-posedness  of  the  multimodel  solution  through  a 
numerical  example.  Finally,  the  chapter  concludes  with  Section  5.8. 


5.2.  The  N-person  Markov  Decision  Problem  with  State  Information 

Consider  a  finite  state  Markov  chain  x^,  t*0,l,2,...  with  state 
space  {l,2,..,n},  and  controlled  by  N  decision  makers  with  decision 
variables  i*l,2,..,N}.  The  transition  probability  of  the  Markov  chain 

at  time  t  depends  upon  the  decisions  {u^;  Z=1,2,..,N}  chosen  at  t.  Thus, 

I 

prob  ”  *’*^‘^*’  ^*t+ll\’'^l*  *  '  *t  observed  at  each  t  and 

{u^;  i=*l,2,.,N}  may  depend  on  it.  Hence,  we  are  concerned  with  feedback 

strategies  {vj(x(t))  ;jt*l,2, .  ,N}.  If  x  *i,  then  any  decision  fu^  g  U,(i) 

*  * 

c  ffi.  ,  j1“1,2,.,N}  may  be  used.  A  stationary  strategy  is  any  element 
v€r  j  V  -  [v^“(u^(l),u^(2),...,u^(n))er^*  ^ x..xU^(n), 

X»1,2,.,N},  r  »  [r^;  X=1,2,.,N}.  If  Xj."i,  and  X*1,2,.,N}  is  used 

then 

Plj(ui(i),.,Ujj(i))  -  prob  |xj.-i} 


where  (Uj(i) , . .  ,Ujj(i))  :  U,  (i)  x  U2 (i)x.  .xUjj(i)  -  m  are  such  that 


'’ij  ^  ^  '’ij  ^ 


For  fv.€r-;  X*1,2,.,N},  P(v)  denotes  the  nxn  transition  probability  matrix 


.  ,Uj^(t)  }.  Note  that  the  i-th  row  of  P(v)  depends  only  on 

u(i)  = 

Over  the  long  run,  each  decision  maker  Incurs  an  expected  cost 
per  unit  time  given  by 


J^(v)  =  lim  E  S  (x^,Uj^(x^),.,Ujj(xp);/»l,2,.,N 
00  0 


(5.1) 


The  following  assumption  will  be  in  force  for  the  rest  of  the  chapter. 
Assvunption  A; 

1)  The  admissible  decision  spaces  U^(^)  compact. 

2)  The  continuous  functions. 

3)  For  each  i,  {f^(i,*):  Uj^(i)x.  .xUjj(i)  -»  IR  ;  1=1,2, .,N}  are  continuous 
functions. 

4)  For  each  v€r,  the  Markov  chain  x^  is  strongly  ergodic. 

Assumptions  A2  and  A4  imply  that  for  each  v^F,  there  is  a  unique  probability 
row  vector  tt(v)  =  (tt  j^(v) , .  ,tTj^(v))  such  that 


Tr(v)  =  tt(v)  P(v);  Tr(v)  >  0  J  (5.2) 

furthermore,  n(v)  is  continuous. 

It  can  be  shown  [39]  that  J,(v)  does  hot  depend  on  the  initial 

z 

state  and  is  given  more  sinqily  as. 


J^(v)  “  tt(v)F^(v);  1=1,2, .,N 


(5.3) 


where 


F,(v)  -  [f,(l,u(l)),  f,(2,u(2)),..,f  .(n,u(n))]  ' 
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Futhermore,  J.(v)  is  continuous  by  Assumption  A.  It  is  convenient  to 

4 


introduce  the  generator 


Q(v)  -  P(v)-I 


Then,Q(v)l  *0;  1  "(I,!,.,!)'  and  Tr(v)  is  the  unique  solution  of 
n  n  %  * 

n 

Tt(v)  Q(n))  *  0,  ttCv)  ^ 


(5.4) 


The  following  result  is  well-known  [39] 


Lemma  5.1: 


Let  Assumption  A  hold.  For  v€r  consider  the  linear  equations 
a/n  ‘  je*1.2,.,N  (5.5) 

i)  If  is  a  solution,  then  ^  ‘ 

ii)  If  {cy.,C  ;  i*l,2,.,N}  is  a  solution,  then  so  is  Ccy,,C.+6l  ; 
jJ*l,2,.,N}  for  every  6. 

ill)  A  solution  always  exists. 

Let  Qj^(v)  be  the  i-th  row  of  Q(v).  It  depends  only  on  u(i).  For  any 


H^(C^,v)  =  Q(v)C^  +  F^(v);  i-l,2,.,N 


(5.6a) 


H^(C^,u(i))  -  Q.(u(i))C^  +  f^(i,u(i));  X-1,2,.,N 


(5.6b) 


We  shall  obtain  the  optimality  conditions  for  the  case  N«2  only, 
the  optimality  conditions  for  N  >  2  can  then  be  obtained  In  a  straight¬ 
forward  manner. 

Assumption  N; 

I)  convex;  1=1, 2,., n;  i=l,2. 

II)  For  any  C^,  strictly  convex  In  l-“l-»2,.,n} 

A=l,2. 

Assumptions  A  and  N  guarantee  the  existence  of  the  Nash  solution. 

For  the  Stackelberg  problem  we  shall  assume  that  DM-1  Is  the 
leader  and  DM-2  Is  the  follower.  The  following  assumption  together  with 
Assumption  A  guarantee  the  existence  of  the  Stackelberg  solution. 

Assumption  S; 

I)  U2(l)  is  convex;  1=1, 2,., n. 

II)  For  any  €2$  H2(C2,u(l))  Is  strictly  convex  In  U2(l);  1=1, 2,., n. 

Let  us  consider  the  Nash  solution  first.  For  any  (0^,02)  define 

^(u^(i),V  =  {u^(l)eUj^(l):  H^(Cj^,u^(l),u^(l)) 

=  min  hJ(C.,u  (l),u  (1))} 

Uk(i)€Uj^(l)  i 

u°(l)  =  ijtk;  1-1,2, .,n.  (5.7a) 

**  H^(C^,u°(1),U2(1));  1-1,2. ,n;  /-1, 2  .  (5.7b) 


The  following  theorem  gives  necessary  and  sufficient  conditions  satisfied 


Theorem  5 » 1; 


*  *. 


Under  Assumptions  A  and  N,  (Vj^,n;2)  Feedback  Nash  iff  there 


'ft  ^ 

exist  ((a^,C^);  4=1,2}  such  that 


^  ^  ^ 


Moreover,  ;  4=1,2. 


Proof:  Follows  from  Theorem  3.4  of  [39]  by  holding  Vj^  fixed  in  DM-2's 
optimization  problem  and  vice  versa. 

We  now  consider  the  Stackelberg  problen.  For  each  announced 
strategy  €r^  of  the  leader,  the  follower  determines  his  response  by 
minimizing  J2(vj^,V2)  ^2‘  solutions 


R(vj^)  *  [vj  €r2  :  J2(Vj^,V2)  <  min  J2(vj^,V2)} 


(5.8) 


V2  6r2 


is  known  as  the  Rational  Reaction  set  of  the  follower.  Assumption  S 
guarantees  that  R(v^)  is  a  Singleton,  and  therefore,  we  have  the  unique 

if 

mapping  R:  “*  V2.  A  strategy  is  a  Stackelberg  strategy  for  the 

leader  if 

J^Cvjl.Rv^)  <  J2(v2,Rv2)  ;  ^  ■  (5.9) 

i(  'k 

The  optimal  strategy  for  the  follower  is  V2  6  Rv^*  For  any  (Cj^,C2)  define 


R^Cu.CD.C,)  -  {U2(i)  eUjd):  HjCCj.u^CD.UjCD)-  min  HjCC-.u-d), 

U2(i)  €  U2(i) 

u,(l))} 


u. (1)  >  arg  min 


Uj^(i)  €Uj^(l) 


hJ(C^,u^(1).R^(u^(1),C2)) 


uJCD  -  R^(u°(i).C2) 


sj^(Cj^,C2)  -  H^(C^,u°(i),U2(i))  ;  i-l,2.,n;  1-1,2 


(5.10 


The  following  theorem  gives  necessary  and  sufficient  conditions  satisfied 
by  the  Stackelberg  strategy. 

Theorem  5.2; 

^  ic 

Under  Assumptions  A  and  V2  *  Rv^)  Is  Feedback  Stackelberg 

ik  ^ 

iff  there  exist  {(a^,C^)  ;  1*1,2}  such  that 

“X  ■  V 

lir  'ff  ^ 

Moreover,  -  Jjj(vj^>V2)  i  1*1,2. 

Proof;  1)  Sufficiency: 

Let  there  exist  {(ajj»C^)  ;  1*1,2}  such  that 


then. 


«>n  - 


TT(v*,V2)aX  "  ®1  “  V  H/(C*,v*,v*) 


'V''2'  “1^  I’d’ ''2' 


^  ^  •ff  ^ 


if  ic 

J  A  (Vi  >  V-  ) 


Since  ^2) »  we  should  have  equality  above.  Hence, 


tt(v^,V2)  [S^(C2^,C2)  “  Ofil^]  “  0 


>^1  n^ 


Since  tt(vj^>V2)  >  0  by  Assumption  A4,we  have 


*  * 


S^(C^,C2)  =  ci^l^ 


Now,  for  the  follower^ let  V2  €r2  be  such  that 


^  ^  it  “is 

H2 (C2 , Vj^ , V2 )  ~  “  ^2^n 


Therefore, 


^  it  it  it  it 

d2Cv^,V2)  *  Vj^> V2 )^2 ^^2 * ^1* ^2 ^  ^  ^2  * 

*  It  it 

Since  Q’2  “  J2''^1’'^2^’  should  have  equality  above. 
Hence, 


tt(v*,V2)[S2(C*,C2)  -  »  0 


Since  n(vj^>V2)  >  0  by  Assumption  A4,  we  have 


★  *  * 
^2^^1’S^  -  a2ln 


Although  the  above  theorems  have  been  proved  under  the  strong 
ergodicity  assumption,  it  is  believed  that  they  hold  even  when  the  Markov 
chain  is  simply  ergodic.  The  proof  for  the  necessity  part  without  the 
strong  ergodicity  assumption  will  be  more  involved. 


Let  us  define, 

N^(Ci,C2)  -  ndn  N^CC^.C^) 

N^CCi.C^)  »  max  nJ(C^,C2)  ;  i=l,2j 

then  the  following  hold: 

Lemma  5.2; 

For  any  (Cj^,C2)  let  (v^.v^)  l>e  such  that 

S^(Ci,C2)  =  H^CC^.VpV®)  ;  i=1.2  . 

Then, 

-  '^je^''l’''2^  -  ’ 

(v^.v®)  is  Stackelberg  if  S^(Cj^,C2)  “^^(^^^,02)  ;  4=1,2. 

Proof:  S^(Ci.C2)  <  n(v°, V2)S^(C^,C2)  “  J^CvJ.v^)  <  S^(C^,C2) 
If  S^(C^.C2)  =  S^(C^.C2)  =  S^(C^,C2)  =  H^(C^,v°.v°) 

then  (v°»V2)  is  Stackelberg  by  Theorem  5.2. 

Lemma  5.3: 

For  any  (G,,C_)  let  (v?,vf)  be  such  that 


i-1,2  . 


Then) 

N^(Ci,C2)  <  <  n^(Ci,C2)  ;  i=l,2. 

(vj.v®)  is  Nash  if  Njj(C^,C2)  =  N^(C^,C2). 

Proof:  Similar  to  that  of  Lenana  5.2. 

Notice  that  unlike  the  control  problem  [39],  we  cannot  bound  the 
* 

optimal  costs  (J.  ;  i=l,2)  in  the  Nash  and  Stackelberg  problems  by  the 

quantities  defined  in  (5.11).  This  fact  makes  it  difficult  to  obtain 
computational  algorithms  for  the  multiple  decision  maker  problems  along 
the  lines  of  control  problem  [39,40],  as  we  shall  see  next. 

5.4,  Computational  Aspects 

One  way  to  compute  the  Feedback  Nash  and  Stackelberg  policies  is 

to  deal  directly  with  equation  (5.3)  of  the  cost  function.  The  Nash 

solution  can  be  computed  by  obtaining  the  point  of  intersection  of  the 

reaction  curves.  The  Stackelberg  solution  can  be  obtained  by  applying 

the  algorithm  of  [51]  for  static  problems.  A  serious  drawback  of  this 

direct  approach,  which  makes  it  computationally  infeasible,  is  that  we 

first  need  to  obtain  the  steady  state  distribution  tt(v)  as  a  function  of 

V  €r .  This  is  very  difficult  in  practice  when  the  Markov  chain  is  of  very 

high  dimension  and  the  admissible  control  sets  U,(i)  are  uncountable. 

A 

An  alternative  approach  which  does  not  involve  computing  tt(v) 


is  to  work  with  dual  variables  and  make  use  of  the  results  of 
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Lemma  5.1,  5.2,  5.3  and  Theorems  5.1,  5.2.  The  policy  iteration  [52]  and 
dual  variable  iteration  [39,40]  algorithms  for  the  control  problen  also 
involve  working  with  linear  equations  of  the  type  (5.5)  rather  than 
computing  tt(v)  end  dealing  directly  with  (5.3). 

Ic  k 

Let  us  consider  the  Nash  problem  first.  For  any  (C^.C^)  we  find 

and  v«  from (5. 7a)  and  N,(C^,C^)  from  (5.7b).  We  need  to  update 
12  \  L  * 

such  that 

lim  [N  (C^,C^)  -N  (cJ,C2)]  »  0  ;  je^l,2. 

k-*«B  * 


Then,  in  the  limit, we  obtain  the  Nash  solution  by  Theorem  5.1 

and  Lemma  5.3.  Since  we  cannot  bound  by  N^  end  _N^  at  every  iteration, 

k  k 

the  algorithm  of  [39]  cannot  be  used  to  update  and  we  do  use  the 

algorithm  of  [39]  to  update  the  dual  variables,  then  convergence  cannot  be 
guaranteed.  But,  if  the  algorithm  does  converge,  then  the  convergent  point 
is  the  Nash  equilibrium. 

If  it  is  known  a-prlori  that  the  Nash  equilibrium  is  stable  [53], 
then  we  can  use  the  following  policy  iteration  algorithm  which  converges 
to  the  Nash  solution. 

Step  1;  Choose  [v^  €  ;  ji=l,2}. 

k+1 

Step  2 ;  Obtain  v  by  applying  the  algorithm  of  [39]  to  the  following 

X 

optimization  problem. 


step  3;  If  then  stop;  otherwise  set  k*-k+l  and  go  back  to  Step  2 

We  now  consider  the  Stackelberg  problem.  By  the  very  nature  of 

the  problem  we  cannot  have  any  algorithm  based  on  policy  Iterations.  So 

any  Iterative  algorithm  has  to  Iterate  on  either  one  or  both  dual 

variables.  Consider  the  following  algorithm  which  Involves  Iterating  on 

both  variables.  For  any  (C^jC^)  we  find  from 

k  k 

(5.10).  If  we  can  update  (Cj^.C^)  such  that 

lim  [s  (c^.ch  -  s  (cj.chl  -  0  ; 

fc-*a>  *  i  * 


then  In  the  limit  we  obtain  the  Stackelberg  solution  by  Theorem  5.2  and 

Lemma  5.2.  Due  to  the  same  reason  as  In  the  Nash  case,  we  cannot  use  the 

Ic  Ic 

algorithm  of  [39]  to  update  (Cj^,C2)  and  guarantee  convergence. 

It  Is  not  possible  to  develop  an  algorithm  based  on  updating  the 
leader's  dual  variable  alone.  But  consider  the  following  algorithm  which 
involves  Iterating  on  the  follower's  dual  variable. 
step  1;  Choose  C2. 

Step  2;  Find 


f'^(Vj^)  -  arg  min  {Q(v^,V2)C2  +  F2(vj^,V2)}. 

V2  €r2 


Step  3;  Obtain  by  applying  the  algorithm  of  [39]  to  the  following 
optimization  problem. 


[Q(vJ.f^(vJ))C^  +  F^CvJ.f^'CvJ))} 


step  4;  Find  V2  *  *  (Vj^),  and 


« /■  k  k.  _k  ,  _  ,  k  k. 
h(C2)  *  Q(Vj^,V2)C2  +  F2^'^1»'^2^' 


Step  5;  Let  hCC^)  *  ®ax 

h(C2)  “  B»ia  • 

Update  C2  -  S“ck  that 


—  k+1  k+1  —  k  k 

h(C2  )  -  ll(C2  >  <  keep  -  keep. 


k  k 

If  hee2)  “  kee2)  <  6, where  5  is  sufficiently  small  positive  real 
number,  then  stop,  otherwise 

let  k  <-  k  +  I  and  go  back  to  step  2. 

k 

It  is  very  difficult  in  general  to  update  C2  in  the  desired  way 

k  k 

because  of  the  implicit  dependence  of  h(C2)  on  via  steps  2  and  3.  Due 

It 

to  this  dependence, the  algorithm  of  [39]  cannot  be  used  to  update  C2  and 
guarantee  convergence.  But  if  we  do  have  convergence,  then  the  limiting 
solution  is  Stackelberg  by  construction. 


Incentive  Policies  in  Stackelbere  Problems 


We  shall  now  obtain  stationary  Stackelberg  strategies  when  the 
leader,  in  addition  to  knowing  the  current  state  of  the  process,  also  has 
access  to  the  follower's  decision  variables.  Under  such  an  information 
pattern,  the  leader  has  a  potential  to  force  the  follower  to  cooperate  in 
achieving  his  global  optimum.  Due  to  the  nature  of  the  information  pattern 
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although  the  leader  declares  his  strategy  first,  he  actually  acts  after  the 
follower  has  made  his  move  at  every  stage  of  the  game  [48] .  We  shall 
consider  such  Stackelberg  problems  with  one  leader  and  N-followers  playing 
Nash,  and  give  an  algorithm  for  computing  affine  incentive  strategy  for  the 
leader  which  helps  him  achieve  his  global  optimum.  Player-0  is  assumed 
to  be  the  leader  and  players  1,  2,..,N  are  asstimed  to  be  the  followers. 

The  following  assumption  is  made  to  guarantee  a  solution  to  the 
new  Stackelberg  problem. 


Assumption  RS; 

i)  Uj^(l)  is  convex;  i=l,2,.,n;  )l=l,2,..,N 

ii)  For  any  C^,  Hj^(C^,u^(i) ,  Uj^(i) , . .  ,Ujj(i)  )is  strictly  convex  in  u^(i) 

andu^(i);  i=l,2,..,n;  ji“l,2,..,N. 

The  leader's  problem  is  solved  in  the  following  steps. 

Step  1;  Obtain  the  global  optimum  of  by  solving 


min  min 


(5.12) 


This  can  be  done  by  applying  the  algorithm  of  [39,40].  Denote  the  minimizing 
solution  by 


,u  (n)]  ;  i-0,l,.,N. 


Step  2;  Choose  the  leader's  strategy  as 


(5.13) 


Pi  •  ‘1^8  lPil>  Pi2***V 


This  strategy  has  the  open-loop  value  whenever  the  followers 

ic  ^  ^  ^  ^ 

are  forced  to  play  ;  i"l»2,.,N}.  Since  (v^, 

desired  open-loop  solution  for  the  leader,  the  are  chosen  such  that  the 

^  A 

followers'  optimal  reaction  is  {v^  “  v^;  i*l,2,.,N}. 

step  3:  Solve  the  linear  equations  for  ;  i-l,2..,N} 


Step  4;  Obtain  {P^;  i-l,2,.,N}  from  the  gradient  equations  of  [50],  which 
in  our  case  can  be  written  as  t 


* 


P/i'^u  (i)®i^^i*“o^^^  •  .Ojj(i) ) 


i 


i-l,2,.,n;  i»l,2,.,N. 


The  leader  declares  to  follower-i,  and  [vj;  j"l».|N;  j  ^  i}.  Then, 
follower-i  solves  the  optimization  problem 


“i^n 


min 

V|€  r 

i  j 


{Q( 


'^o* 


''!»•»  Vi*  V  Vi’ •*  VV  ’ ''1”"  Vi’ Wi’ V  ^ 


and  obtains  his  optiaml  strategy  *  v.  by  applying  the  algorithm  of 

Jb  Jt 

[39,40]. 

Notice  that  the  Stackelberg  solution  of  this  section  is 
computationally  easier  to  obtain  than  the  Stackelberg  solution  of  the 
previous  section. 


Now  we  shall  consider  the  Incentive  design  problem  In  the 
context  of  large  Markov  chains  consisting  of  N  weakly-coupled  groups 


of  strongly 'Interacting  states.  Such  models  arise  naturally  In  the  modeling 
of  reservoir  dynamics  In  hydro-scheduling  problems  [41,42]  and  queueing 
network  models  of  computer  systems  [46,47].  We  shall  assume  that 
transitions  from  each  group  are  controlled  by  a  single  decision  maker 
having  his  own  performance  objective  and  the  overall  system  Is  coordinated 
by  a  leader  whose  objective  Is  to  optimize  some  global  system  performance. 
The  computational  algorithm  for  obtaining  the  near-optimal  policies  will 
be  shown  to  exhibit  multimodel  features,  l.e.,  each  lower  level  decision 
maker,  in  order  to  compute  his  near-optimal  policy,  need  only  know  his 
'local'  dynamics  and  some  'aggregate'  of  the  rest  of  the  system. 

Weakly- coupled  Markov  chains  are  described  by  the  generator  matrix 
A  +  eB  [43,44],  where  A  and  B  are  both  n-dlmenslonal  Markov  generators 
having  the  form 


(5.14) 


with  {Aj,  j*l,2,.,N}  being  -dimensional  Markov  generators. 

Thus  the  Markov  chain  consists  of  N  groups  of  strongly-interacting 


states.  The  weak  Interactions  between  states  In  different  groups  are 
modeled  as  multiples  of  a  small  positive  scalar  e. 
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For  the  decision  problem  to  be  considered  we  have 


Vj-  [“j(tti+..-taj^^+l),..,Uj(nj^+..-taj)]  ;  (5.15) 

where,  as  before,  player  0  Is  the  leader  and  players  1,2,.,N  are  the 
followers  playing  Nash. 

The  cost  vectors  of  the  decision  makers  are  of  the  form 


^i^'’o’''l’*»''N^ 


f^(l,u^(l),u^(l)) 

fi<“l>“i<“l>>“o<“l» 

f  ^(nj^+l,U2  (nj^+1)  ,u^(nj^+l)  ) 

I 

I 

f^(ni+n2  ,U2  (0^^402) »%  (nj^4ii2  )  ) 


VV“n'“' ’“o  ' 

I 

I 

I 

f^(“.\(“)»«o(n)) 


i-0,l,2,.,N. 

The  following  assumption  Is  made  about  the  process. 


(5.16) 


Assumption  B; 

r  ^  * 

1)  For  all  i=0,l,.,N}  and  0  <  e  <  e  ,  the  Markov  process  defined 

by  A  +  eB  has  a  single  ergodic  class. 

For  each  and  Markov  process  defined  by  Aj(v^,Vj)  has 

a  single  ergodic  class. 


2) 


J 


Assumption  B2  Implies  that  each  Aj(v^>Vj)  has  one  zero  eigenvalue. 


The  corresponding  right  eigenvector  t.  Is  the  n. -dimensional  column  made 

■J  J 


of  ones.  The  left  eigenvector  n^ -dimensional  row  of 


stationary  probabilities  for  the  states  In  the  j-th  group  when  e**0. 
Let, 


\d 


o  ■■ 


'^l(v^.Vi) 


»  V (Vq»  Vj^» .  »Vjj) 


(5.17) 


It  has  been  shown  In  [44]  that  the  n-dlmenslonal  probability 
vector  p  of  the  Markov  process  can  be  approximated  by. 


p  •  TjV  +  0(e) 


(5.18) 


where  T]  Is  the  M-dlmenslonal  probability  vector  of  the  aggregate  Markov 
process  with  generator  VBT  describing  the  transitions  between  different 


groups . 


Let  ttCviE)  and  tt(v)  be  the  unique  solutions  of 


rr(v,e)  (A(v)  +  eB(v)l  “  0  ; 

TT(v,e)ln  “  1  1 

(5.19) 

tt(v)  V(v)B(v)T  -  0  ; 

tt(v)1jj  “  ^  * 

(5.20) 

where  v 

Then  we 

have. 

TT(v,e)  =  n(v)  V(v)  +  0(e) 

s 

(5.21) 

For  any  given  policy  v€r,  the  average  cost  per  stage  can 

be  approximated  as , 

Jjj(v,e)  =  iT(v,e)  F^(v) 

-  n(v)  V(v)  F^ 

(v)  +  0(e) 

«  Jjj(v)  +  0(e) 

;  A»0,1,2,.,N. 

(5.22) 

J^(v)  is  the  average  cost  per  stage  associated  with  the  aggregate  chain 
and 

J^(v)  -  n<v)  F^(v)  ;  4-0,l,2,.,N.  (5.23) 

where  F^(v)  *  V(v)F^(v)  is  the  N-dimensional  instantaneous  cost  vector 
associated  with  the  aggregate  chain. 


We  shall  now  obtain  near-optimal  policies  based  on  the  aggregate 
costs  J^(v).  In  terms  of  the  aggregate  dual  variables  we  can  write 


“^In  ” 


“  Jjj(v)  ;  C^€]R*^  ;  ^-0,1,2, .,N  .  (5.24 


Alternatively, 


*  V(v)  [B(v)TC^  +  F^(v)] 


V(v)  g^(v)  ;  /=0,1,2,.,N. 


(5.25 


where 


VTC,  +  F^iCvJ.Vj) 


8,<v)  -  8,2 


B2<v^.V2)TC,  +  F,2(v^,V2) 


B^(v«,V2lCi  H-  F,^(v«.^4,) 


(5.26 


Therefore,  in  component  form  (5.25)  becomes 

otj^  “  Vj(v^,Vj)  B^jCVqbVj)  i  j“l,2,.jN  j  A*0,1,^.,N  .  (5.27 


Each  component  in  (5.27)  can  be  Interpreted  as  average  cost  per  stage 
associated  with  the  n^  -dimensional  local  chains  with  generators 
Ai (v_ , V, ) • 


Hence,  we  can  write 


j  "*"  ;  X“0,l,2,.jN  • 

(5.2{ 


where  C^j6]R  ''  are  the  dual  variables  associated  with  the  local  chains. 

Based  on  the  hierarchical  structure  given  by  (5.24)- (5.28) 
of  the  aggregate  costs,  we  now  formulate  an  algorithm  to  solve  the 
leader's  problem. 

Step  1;  Obtain  the  global  optimum  of  Jq(v)  by  the  following  iterative 
scheme. 

i)  Choose 

o 

ii)  Solve 

‘'oj<=o>ln  ■  g^j(v^.Vi)Jj  J-l,2,.,N  . 

using  the  algorithm  of  [39,40] 

iii)  Find  h  (C*^)  -  max  h  ,  (C*^) 

'  o  o'  j  oj '  o' 

h  (c'‘)  -  min  h  .  (c’‘) 

-o'  o  j  oj'  o' 

—  *lc  Ir 

If  otherwise  update  ^  by  the  algorithm  of 

[39];  set  l»-k+l  and  return  to  (ii).  Denote  the  optimal  solution  by 

A 

Vo  ;  ^0,1,2,.,N. 


j=l,2,.,N;  j#<e. 
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Step  3;  Solve  the  local  optimization  problem 


1  V'^i 


applying  the  algorithm  of  [39]. 


Step  4;  Find 


-  .in 


■  If 

If  h.(C.)  -  h  (C.)  w  0  then  stop;  otherwise  update  C  by  the  algorithm  of 
jJ  »  4  *  * 

[39];  set  k*-k+l  and  return  to  step  2. 

When  the  algorithm  converges,  the  leader's  declared  strategy 
ensures  that  the  followers'  optimal  reaction  based  on  the  aggregate  costs 

^  A 

would  be  [v^  “  ;  f=l,2,.,N}. 

Let  us  now  examine  the  saliant  features  of  the  algorithm 
presented  above.  Specifically,  we  would  like  to  see  what  each  decision 
maker  has  to  know  about  the  system  model  and  the  costs  in  order  to  compute 
his  strategies.  The  leader,  being  the  overall  coordinator  has  to  know  the 
full  A  and  B  matrices  and  the  cost  vectors  of  all  the  decision  makers. 

Each  follower  on  the  other  hand,  need  only  know  his  own  local  generator 
matrix  A^>  the  interconnection  matrix  B  and  the  steady  state  distribution 
of  the  other  local  Markov  chains  along  the  optimal  solution.  He  need  not 
know  the  detailed  dynamics'  of  the  other  local  Markov  chains.  This 
multlfflodel  situation  accounts  for  many  practical  problems  where  the  'local' 


decision  makers  do  not  have  an  exact  knowledge  of  the  'global'  model. 
Note  that  none  of  the  decision  makers  need  to  know  the  value  of  the 


perturbation  parameter  e . 

In  the  sequel  we  shall  give  a  series  of  propositions  which 
establish  the  asymptotic  properties  of  the  multimodel  solution  given 
above . 

Let  us  denote  the  optimal  Stackelberg  solution  for  the  full 

problem  as 


ie  ic  ic 

C®  ) »  •  t  " 

The  following  proposition  establishes  the  'closeness'  of  the  multimodel 
solution  to  the  optimal  solution. 

Proposition  5.1; 


i) 

ii) 

iii) 


If  the  multimodel  solution  fv  > 

o 

..A  A  ^  ^ 

'^o^'^o'  ''l’  *  *  "  '^o^''o’  ''l’  ' 

V*  =  +  0(6)  ;  4«0,1,2,  .  ,  N 

AM  A  ^  ^ 

J^(Vo»  .  y  N^)  “  J^(v^»  • 


Vi*  .  j  is  unique, then 

,  N^)  =  0(e^  . 


>  “  0(e)  ;  /=1,2,  .  , 


N  . 


Proof;  (i)  and  (ii)  follow  directly  from  [44]  because  the  leader's 
problem  is  a  global  minimization  problem. 

To  prove  (ill)  we  let 


<  |J^(v)  -  7,(v)|  +  |j,(v*)  -  +  |j^(v)  -  j,(v*)| 


J 


t 
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i  a 


Due  to  (5.22)  we  have 


|jjj(v)  -  J^(v)|  -  0(e) 
lJ,(v*)  -  J,(v*)|  -  0(e) 


Due  to  (11)  and  the  continuity  of  the  aggregate  costs  In  the 


—  * 


controls  we  have  ^ ^1  “ 

Hence,  (111)  follows. 

The  following  two  propositions  establish  the  robustness  properties 
of  the  multlmodel  solution. 

Proposition  5.2; 

M  A  A 

Let  *  erg  oiin  Vj^,  •  •  J  .  ,  N  • 

V^je 

then  , 


I)  ^^l*  *  *  '^l*  *  *  "  0(t  )  >  .  |N 

l.e.;  no  follower  can  benefit  significantly  by  deviating 
unilaterally  from  the  multltoodel  solution. 

A  ^  ^  A  A 

II)  ^K^'^o’  ''l’  ’  »  ’  k-=0,l,.,N;  k^X 

l.e.;  by  deviating  unilaterally  from  the  multimodel  solution,  no 


follower  can  hurt  the  other  decision  makers  significantly. 


Proof; 


1)  Since  “  erg  min  J^(v^,  Vj^»  •  «  v^,  .  y  \j^),  it  follows  directly 


from  [44]  that 


'^X^'^o'  '^l*  '  *  '*1’  ’  *  '^X’  ’  *  ^ 


Furthermore,  +  0(e)  ;  X"l,2,.,N. 


"  '^k^''o’''l’  *  *  '^^  “  *^k^''o’  ''l’  •  *  ''i*  •  * 

=  0(e)  ,  due  to  (i)  and  the  continuity  of  the  aggregate  costs  in 
the  controls. 


Proposition  5.3; 

Let  (vj^>\)2»  •  1  Vjj)  be  the  optimal  Nash  reaction  of  the  followers 
to  the  declared  strategy  of  the  leader;  then 

”  '^o^'^o*'*l*'^2’  *  *  **  ^(s) 

i.e.;  the  leader  does  not  lose  significantly  if  the  followers, 
instead  of  playing  their  multlmodel  strategies,  respond  optimally. 

—  —  ★  ★  if 

ii)  *^0 ^'^o’'*l’ *  ‘  *  ”  '^o^'^o*'^!’  ’  *  *"  0(6) 

i.e.;  the  leader  does  not  lose  significantly  by  declaring  his 

^  if 

multlniodel  strategy  instead  of  his  optimal  strategy 

Proof; 

i)  Define 

^ )  »  i4“0»l»2,  •  j  N. 

By  observing  that  {v^  ;  ;^1,2,  .,N}  is  the  optimal  Nash  solution 

for  the  followers  with  respect  to  the  costs  {j°  ;  i*l,2,  .  ,  N}; 

Jt 

A 

and  (v«  i  ^^>2,  .  )  N}  is  the  optimal  Nash  solution  for  the 
Jl 

followers  with  respect  to  the  aggregate  costs  {7]  ;  i^l,2,  .  ^  N}; 

Jl 

we  can  show  by  constructing  matched  asymptotic  expansions  as  in  [44] 
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that 


•  I  Vjj)  “  .  j  N  j 


furthermorej  *  4“1»2»  •  >  N. 


Hence}  (1)  follows  because  of  continuity  of  costs, 
ii)  follows  from  (!)  above  and  (i)  of  Proposition  5.1. 


5.7.  An  Example 

We  shall  now  consider  a  numerical  example  of  a  weakly>coupled 
Markov  chain  and  obtain  the  near-optimal  incentive  policies.  The  example 
is  motivated  by  the  following  hydro -scheduling  problem  for  electric 
power  generation. 

Consider  a  hydro-power  system  consisting  of  a  central  reservoir 
which  feeds  into  three  local  reservoirs.  For  simplicity  assume  that  the 
central  reservoir  feeds  into  the  local  reservoirs  one  at  a  time}  and 
switches  between  reseirvoirs  in  a  random  fashion. 

When  the  central  reservoir  is  feeding  into  one  of  the  local 
reservoirs,  the  other  two  reservoirs  are  assumed  to  be  in  some  'idle' 
state.  Each  local  reservoir  is  assumed  to  be  under  the  authority  of  a 
separate  decision  maker  who  controls  the  rate  of  water  release  u^  for 
electric  power  generation.  The  'state*  of  each  local  reservoir  is 
characterized  by  its  water  level,  assumed  to  be  1,2,3,  when  it  is  active, 
and  'idle'  when  it  is  inactive.  Itie  central  reservoir  is  assumed  to  have 
an  infinite  capacity.  The  local  level  changes  are  assumed  to  be  of  high 
probability  compared  to  the  switching  of  the  central  reservoir  between  the 


different  local  reservoirs.  There  is  an  overall  coordinator  or  leader 
who  controls  the  rate  of  switching  and  the  Inflows  into  the  local 
reservoirs.  His  decision  variable  is  assumed  to  be  u^.  The  objective 
of  each  local  decision  maker  is  to  minimize  his  own  local  average 
production  cost  per  unit  time,  whereas  the  objective  of  the  leader  is 
to  minimize  the  global  average  production  cost  per  unit  time. 

The  above  system  can  be  modeled  by  a  nine  state  Markov  chain 
consisting  of  3  weakly-coupled  groups  of  strongly-interacting  states. 
The  states  are  as  follows: 

1  =  (Inflow  into  reservoir  1,  level  1,  reservoirs  2,3  idle) 

2  «  (inflow  into  reservoir  1,  level  2,  reservoirs  2,3  idle) 

3  =  (inflow  into  reservoir  1,  level  3,  reservoirs  2,3  idle) 

4  «  (inflow  into  reseirvoir  2,  level  1,  reservoirs  1,3  idle) 

5  ■  (inflow  into  reservoir  2,  level  2,  reservoirs  1,3  idle) 

6  ■  (inflow  into  reservoir  2,  level  3,  reservoirs  1,3  idle) 

7  »  (inflow  into  reservoir  3,  level  1,  reservoirs  1,2  idle) 

8  ■  (inflow  into  reservoir  3,  level  2,  reservoirs  1,2  idle) 

9  »  (inflow  into  reservoir  3,  level  3,  reservoirs  1,2  idle). 
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The  Instantaneous  costs  are  , 

f “  (2-n)^  +  25(Uj^(n))^  +  10(u^(n))^  ; 

■  (5-n)^  +  30(Uj^(n))^  +  10(u^(n))^  ; 

f3(n,u^(n),U3(n))  -  (8-n)^  +  20(03(0))^  +  I5(u^(n))^  ; 

1  ^  2 


n»l,2,3 

n«4,5,6 

n=7,8,9 


Using  the  algorithm  of  the  previous  section,  the  near-optimal  affine 
Incentive  policy  for  the  leader  Is  obtained  as 


“ 

u^(l) 

0-067  -  0-5833(Uj^(l)  -  0-146) 

0-052  -  0-4762(Uj^(2)  -  0*098) 

u^(3) 

0-046  -  0-6334(Uj^(3)  -  0*051) 

0-071  -  0*5721(u2(4)  -  0*131) 

Uo(5) 

8 

0-066  -  0*4654(03(5)  -  0*081) 

^0(6) 

0-056  -  0-6142(0^(6)  -  0*051) 

Uo(7) 

0-055  -  0-6518(03(7)  -  0*164) 

So  (8) 

0-048  -  0*5532(03(8)  -  0*112) 

So  (9) 

0-044  -  0*7156(03(9)  -  0*06) 
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The  optimal  strategies  for  the  followers'  are  given  by, 


u^(l) 

0-146 

U2(4) 

A 

'^l  " 

u^(2) 

- 

0-098 

A 

‘  ''2  “ 

u^CS) 

u^O) 

0-051 

U2(6) 

U3(7) 

0*164 

A 

''3  “ 

-3(8) 

- 

0*112 

- 

U3(9) 

0*06 

0*131 

0*081 

0*051 


The  resulting  costs  (for  e  ■  0*1)  are  given  by 

-  0*76541 

-  0*75332 
»  0-75884 


0*75917 


.5 


e 


e  -  0.01 


0.066  -  0.5848(Uj(l)  -  0.144) 
0.052  -  0.4814(uj^(2)  -  0.1) 
0.045  -  0.6444(Uj^(3)  -  0.051) 
0.07  -  0.5688(u2(4)  -  0.13) 
0.068  -  0.4711(u2(5)  -  0.082) 
0.056  -  0.6155(u2(6)  -  0.051) 
0.057  -  0.6622(u2(7)  -  0.165) 
0.048  -  0.5601(u3(8)  -  0.11) 
0.044  -  0.711(u3(9)  -  0.056) 


^  «  0.74186 
o 


^j(v)  =  0.74212 


The  above  nuunerlcal  computations  clearly  Illustrate  the  convergence 


of  J  to  J  (v)  as  e-*0. 
o  o 


5.8.  Conclusions 


In  this  chapter  we  have  considered  the  average-cost-per-stage 
problem  for  finite-state  Markov  chains  controlled  by  multiple  decision 
makers.  After  formulating  the  general  decision  problem  and  obtaining 
certain  fundamental  existence  results,  we  focused  our  attention  on  the 
multlfflodellng  problem  for  a  class  of  Markov  models  consisting  of  N  weakly- 
coupled  groups  of  strongly- Interacting  states.  We  have  outlined  a 


procedure  for  obtaining  near-optimal  incentive  policies,  which  allows  the 
'local'  decision  makers  to  use  different  sliiq>llfied  models  of  the  system. 
Specifically,  we  have  shown  that  each  'local'  decision  maker  need  only 
know  the  generator  matrix  of  his  own  local  Markov  chain,  the  generator 
matrix  describing  the  intergroup  transitions,  and  the  invariant  measure  of 
the  other  local  chains  along  the  optimal  solution.  Only  the  coordinator 
needs  an  exact  knowledge  of  the  'global'  model.  The  well-posedness  of  the 
procedure  has  been  Illustrated  by  a  numerical  example. 


CHAPTER  6 


INFORMATION  INDUCED  WJLTIMODEL  SOLUTIONS 

6.1,  Introduction 

In  the  previous  chapters  we  adopted  a  perturbational  approach  to 
the  multimode ling  problem.  The  crucial  Issue  was  one  of  well-posedness  of 
the  multimodel  design.  We  had  to  establish  the  convergence  of  the  optimal 
solution  to  the  multimodel  solution  In  the  limit  as  the  perturbational 
parameters  go  to  zero. 

In  this  chapter  we  attempt  to  Induce  a  decomposition  of  the 
problem  based  on  Input-output  considerations,  such  that  the  optimal 
solution  within  a  class  of  admissible  strategies,  can  be  obtained  from 
multiple  reduced-order  models  with  partial  noninteraction  among  the 
decision  makers. 

In  large  scale  systems,  the  DM's  observe.  In  general,  different 
variables  through  their  Individual  objective  functionals.  These  observed 
variables  play  a  crucial  role  In  the  solution  of  the  problem.  Here  we 
focus  on  the  role  of  the  observed  variables  In  multlmodel  strategy  design. 

We  attempt  to  Identify  the  core  by  examining  the  Input  structure  and  the 
observability  structure  Induced  by  the  observation  sets  of  the  DM's. 

In  Section  6.2  we  formulate  the  problem,  and  discuss  the 
structural  decomposition  and  the  class  of  admissible  strategies  referred 
to  as  Structure -Preserving  strategies.  In  Section  6.3  we  obtain  multimodel 
solutions  under  FPS  and  FIS  Information  patterns.  In  Section  6.4  we  discuss 
decoupling  of  completely  observable  systems.  In  Section  6.5,  we  discuss 


briefly  extensions  Co  many  decision  maker  problems  and  Pareto  games.  In 
Section  6.6,  we  study  applications  of  the  concepts  to  control  of  large 
scale  interconnected  subsystems  and  multiarea  power  systems.  Section  6.7 
concludes  the  chapter. 

6.2.  Problem  Formulation 


6.2.1.  The  problem 

Consider  a  linear  system  controlled  by  two  DMs, 
X  =  Ax  +  B^Uj^  +  B2U2  ;  x(0)  =  x^ 

y^  -  C^x;  i  =  1,2 


(6.1a) 

(6.1b) 


dim  X  =  n,  dim  *  m^,  dim  y^  =  p^^ 

The  variables  y^  will  be  referred  to  as  the  'observation  set'  of  each  DM. 
These  are  in  fact  Che  controlled  variables  as  seen  through  the  performance 
index  of  each  DM,  and  may  or  may  not  correspond  to  the  actual  system  outputs 
available  to  each  DM. 

The  performance  index  of  each  DM  is  given  by 

09 

Ji(Yi»Y2)  “  C  2  I  "i^^^  "  Yi(*)};  i  “  1,2  (6.2) 


where  y^(')  is  the  admissible  strategy  of  DMi,  measurable  with  respect  to  the 
sigma-algebra  generated  by  his  information  set  (Co  be  specified  later). 
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The  DMs  are  Co  select  optimal  strategies  Cy^Iy^  ^  ^  "  1,2}  such 


Ji(Yj*Yp  <  Jj^(Yi.Yj);  V  i,j  -  1,2;  i  j  (6.3) 

where  {F^;  i  «  1,2}  are  some  admissible  strategy  sets  for  Che  DMs  Co  be  spec¬ 
ified  later.  The  pair  of  inequalities  in  (6.3)  define  the  Nash  equilibrium 
point. 

In  large  scale  game  problems,  the  curse  of  dimensionality'  may  ren¬ 
der  any  direct  approach  Co  the  optimal  solution  computationally  intractable. 
Hence  there  is  a  strong  motivation  for  the  DMs  to  look  for  alternative  ap¬ 
proaches  to  Che  problem  which  ease  Che  computational  difficulties.  The  ap¬ 
proach  formulated  in  the  sequel  has  the  desirable  feature  that  it  induces  a 
partial  noninteraction  among  Che  DMs  leading  to  a  lower  order  gasx  .  This  i? 
done  by  choosing  appropriate  admissible  strategy  sets  F^  based  oti  a  partic¬ 
ular  structural  decomposition  of  the  system. 

6.2.2.  Structural  decomposition 

The  observation  sets  of  Che  DMs  given  by  (6.1b)  Induce  a  certain  ob- 
servability  decomposition  on  the  state  space.  We  propose  to  exploit  this 
decomposition  to  obtain  mulcimodel  strategies.  To  do  this,  we  start  by  ex¬ 
hibiting  this  observability  decomposition  explicitly  by  transforming  the 
state  space.  This  may  be  done  either  by  performing  chained  aggregation  se¬ 
quentially  with  respect  to  each  DM's  observation  sec  [8,54,55];  or,  equiva¬ 
lently  by  making  a  similarity  transformation  directly, following  a  procedure 
dual  Co  Che  one  in  [56,61]  where  a  controllability  decomposition  was  achieved. 


The  eigenvalues  of  Ca^^;  1  •  I»2}  represent  the  nodes  which  are  ob¬ 
servable  only  to  DM1  but  not  to  DMj  (1  J);  the  eigenvalues  of  represent 

the  nodes  which  are  observable  to  both  the  DMs;  and  the  eigenvalues  of  A^ 
represent  the  nodes  \dilch  are  unobservable  to  both  the  D(fs. 

For  slnpllclty  we  shall  neglect  the  Jointly  unobservable  nodes.  In 
a  well  fomulated  problem  these  modes  are  stable  and  do  not  contribute  any¬ 
thing  to  the  cost.  Hence,  from  now  onwards  we  shall  assume  the  system  ma¬ 
trices  to  have  the  following  formi 


<=i*  t<=ii 


=13  > 


(6.5b) 


The  input  structure  specified  by  the  matrices  B2  are  not  In  a 
form  suitable  for  our  analysis.  We  need  to  make  Input  space  transformations 


In  order  to  appropriately  overlap  the  input  structure  with  the  observability 
decomposition.  Assuming  that  the  pairs  ^ 

controllable,  there  exist  matrices  G2  such  that  the  input  space  transfor¬ 
mation  {u^  *  ^i^i*  ^  gives  the  new  input  matrices  the  following  form 


Bi  -  B,G, 


®11  ®14 


0  B, 


0  B, 


®2  - 


0  B„ 


®22  ®24 


0  B, 


(6.6) 


where  the  pairs  ^  “  1*2}  ate  controllable. 

Remarks ;  Before  performing  the  input  space  transformation,  we  might  need  to 
do  another  state  space  transformation;  but  this  can  be  done  without  destroy¬ 
ing  the  observability  decomposition.  This  is  to  put  the  system  in  an  appro¬ 
priate  basis  such  that  Z  •  where  Z  is  the  state  space  and  is  a 

controllability  subspace  of  DMi.  The  input  space  transformations  G^  identify 
explicitly  the  control  channels  through  which  the  individually  observable 
modes  are  completely  controllable  [57]. 


.  Structure-Preservine  stratesies 


[•»ni*Te, 
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where 


*^1  "  2  ;  i  -  1,2  ; 

■  shall  assuoe  t±ac 


(6.8) 


i.j  -  1.2  ;  i  j*  j  . 


The  nature  of  Che  results  obtained  here  hold  for  arbitrary  positive 

A 

definite  R^;  but  assuming  a  bloek>dlagoaal  form  results  In  simpler  derivations 
Before  we  obtain  the  Nash  solution,  we  need  to  define  the  set  of. 
Admissible  strategies  for  each  JM.  The  admissible  strategy  secs  chat  we  are 
particularly  Interested  in  here  will  be  referred  to  as  ’Structure-Preserving' 
strategies  and  are  defined  below. 

Definition;  A  Structure-Preserving  strategy  set  Is  the  set  of  all  linear  feed 
back  strategies  which  preserve  the  observability  decomposition  (6.5)  of  the 
closed^loop  system. 

In  the  single  IRl  case,  Che  three-component- control  of  [55,62]  Is  a 
Structure-Preserving  control.  After  the  first  component  achieves  decoupling, 
the  second  and  third  components  which  control  the  aggregate  and  the  residual, 
respectively,  are  Structure -Preserving.  The  design  In  [55]  was  purely  from  a 
pole- placement  point  of  view  without  any  optimality  considerations.  Here  we 
shall  show  that  In  the  multiple  DM  case,  the  design  of  Structure-Preserving 
Nash  strategies  leads  to  multimodel  solutions. 


6.3  Multimodel  Solutions 


We  shall  consider  two  types  of  Information  patterns  for  the  DMs: 
the  Feedback  Perfect  State  (FPS)  and  the  Feedback  Imperfect  State  (FIS) . 
Under  the  FPS  information  pattern,  each  DM  knows,  at  time  t,  the  current 
state  of  the  system,  x(t) ;  and  luder  the  FIS  information  pattern,  each  DM 
knows,  at  time  t,  only  the  current  value  of  his  observation,  y(t). 


6.3.1.  FPS  information  pattern 

Under  the  FPS  information  pattern,  the  admissible  strategy  set 
of  DMi  is  the  set  of  linear  state  feedback  strategies  which  are  Structure- 
Preserving.  Specifically, 


^  ViIyi(J)  - 


^l‘ll 


F  6 
^ii°i2 


0 


i 


1,2  i  (6.9) 


where  ia  the  Kronecher  delta. 

Now,  to  find  the  Nash  solution,  we  need  to  find  a  pair  €  F^  ; 
i  -  1,2}  such  that  the  pair  of  inequalities  (6.3)  are  satisfied.  Sub¬ 
stituting  u^  *  V£(x)  (6.9)  in  (6.7)  and  (6.8)  we  get 


•  ^  «  m 

X  ■  Ax  ;  x(0)  ■  x^ 

(6.10a) 

Pi  -  V 

(6.10b) 

1  * 

-  2  J J  i  ■  1»2  j 

j.  ^  0  ^ 

(6.11) 

where 
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and  the  closed- loop  system  matrix  is 


A  -  A  -  -  B2F2  - 


^^13*  ®11^13“  ®  14^3 1"®2 1^32^ 


^^22“  ®22^22^  ^^23"  ®22^23*  ®24^32‘®  12^31^ 


^^33  “  ®  13*^13  *  ®23^32^ 


^11 


0 

0 


13 


^22  ^23 


‘33 


(6.13) 


The  optimal  solution  F^^*  ^31  »  ^  *  ^»2}  will  depend  In  gen¬ 
eral  on  the  Initial  conditions  x^  [58].  To  remove  this  dependence,  we  fol¬ 
low  [58]  and  assume  that  the  initial  conditions  are  random  with 


E(XoX;]  -  N  >  0  , 


and  modify  the  cost  functionals  to  be 
1  " 

^1  •  2  ®  ^  }  ;  1-1,2  . 

Xo  0 

Introduce  L  €  TbH*^  defined  by 


(6.14) 


(6.15) 


7  *0^1*0  "  2  I ^ 


L-  ^  E[x(t)x’(t)]dt 


(6.16) 


(6.17) 


For  any  given  pair  such  that  Re\(A)  <  0,  >  0  and  L  >  0  satisfy 

the  matrix  Lyapunov  equations 


M^A  +  a\  +  Qi  -  0  ;  i  -  1,2 

AL  +  la'  +  N  »  0  . 


Partition  L,  N  appropriately 


-  M 


(i) 

11 

12 

(i)' 

12 

S2 

.(!)' 

13 

S3 

M33 

i  -  1,2 


(6.18) 

(6.19) 


(6.20a) 


^11  ^12  ^13 


^12  ^22  S3 


I  • 

^13  S3  S3 


(6.20b) 


Applying  the  Matrix  Minimum  Principle  [59],  the  optimal 

Feedback  Nash  solution  can  be  shown  to  satisfy  (for  i,J  ■  1,2;  1  j) 


Si^li^ii  Si^l3S3  "  ®ii^li^^ll  "  ®ii”i3^^13  " 
Sl^ii^i3  Sl^i3S3  ■  ®ii“i3^S3  "  ®ii”ii^^i3  " 

B  IT*  T  -  r’  -  *'  -  R* 

Sj^3iS3  ®i3”i3  ^13  ®i3“33  S3  ®i4”ii  S3 


(6.21a) 

(6.21b) 


By  an  abuse  of  terminology,  we  shall  refer  to  the  solution  as 
Feedback  Nash.  It  is  not  Feedback  Nash  in  the  sense  defined  in  [9]  because 
it  does  not  satisfy  the  Principle  of  Optimality.  It  is  Nash  in  feedback 
information  pattern. 


11  ^11  ^11** 


”il'^i3  “13^^33  ^il**!^  ■•■  ^11^13  ■*■  ^11^11^13  "  ° 


(6.22c) 


+  A*'m  +  c'  F 

33  33  *33“33  *  “l3  ^13  *  ^13^13  ^  ^13^13 


■'“  ^13^11^13  ■*■  ®^31^1j^31 


(6.22d) 


^11^13  ^13^33  '*'  ^13^33  ■'‘  *^13  "  ° 


(6.23a) 


^33^3  ■*■  ^33^33 


(6.23b) 


CaJi*  aJ3.  A33  ;  1  -  1,2}  are  as  In  (6.13)  with  {Fj 
1  ■  1,2}.  Solving  (6.21)  we  obtain, 

V*  m  s'k' 

^11  ^1?11**11 


'^il*  ^13  ^13*  ^31  *^31* 


(6.24a 


V*  m  5'^' 

^13  \i?il“i3 


(6.24b 


4  -  »lj‘i3'“33’  +  “u’'«-134>  +  4^i4'’‘u^“u^'-134> 


Notice  that  even  though  equations  (6.21a)  and  (6.21b)  are  coupled  In  and 

4p 

F13,  we  are  able  to  solve  for  them  explicitly  as  In  (6.24a)  and  (6.24b). 

This  fact  plays  a  crucial  role  In  showing  that  the  Nash  solution  admits  a 
partial  noninteraction.  Substituting  (6.24a)  in  (6.22b)  we  obtain. 


r-'i 


—  j 


I 

i 

I 


11 

f* 


L’, 
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0  ;  i  -  1,2  (6.25) 


It  can  be  readily  seen  from  (6.24a)  and  (6.25)  that  is  the  solution  of 
an  optimal  state  regulator  problem  with  parameters  B.^^). 

The  following  proposition  highlights  the  multimodel  nature  of  the  Nash 
solution. 

Proposition  6.1: 

Given  the  linear  system  (6.7)  controlled  by  two  DMs,  and  their 
performance  Indices  (6.8),  the  design  of  Structure -Preserving  Feedback 
Nash  strategies  under  the  FPS  Information  pattern,  leads  to  two  low-order 
coupled  optimization  problems  defined  by 


I  * 

min  -  E  C  |  /  (y^y^  +  u[R^u^)dt  } 


‘io 


subject  to 


where 


r  F  F  * 

*11  *13 

L  °  ^31  . 

• 

r*ii 

^13  “®Ji^3j 

A 

®ii 

A 

«14 

1 - 

o 

^33  "  ®j3^3j 

*1  + 

0 

— 

A 

®i3 

f^li 

^13^*1 

"l  !  '  ‘lo 


'  •) 
“1i 
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*t‘io 

i,j  -  1,2  :  1  ii  j 

The  solution  to  this  pair  of  coupled  optimization  problems  admits  partial  non¬ 
interaction  among  the  IMs,  and  Is  given  by  the  set  of  equations (6. 22><6*25) . 

At  this  point  we  would  like  to  remark  that  the  controllablllty-obser- 
vability  of  the  triple  l(A^^.  ^11^*  ^  guarantees 

Re  X(aJ^)  <  0  ;  1  -  1,2  (6.26) 

For  the  solution  to  be  well-defined  we  need  only  to  verify  that  Re  \(A23)  <  0. 

The  coupling  between  the  optimization  problems  of  the  two  IXfs  Is  due 
to  the  presence  of  the  control  gain  of  mj  In  DM1 's  low-order  model.  Par¬ 
tial  noninteraction  Is  achieved  because  each  DM  can  evaluate  his  control  gain 
F^j^  Independently  In  a  decentralized  manner  by  solving  equations  (6.24a)  and 
(6.25).  The  control  gains  F*^^;  1  ■  1,2}  are  then  obtained  by  solving  the 

coupled  set  of  equations  (6. 22c-d),  (6.23),  (6.24). 

Hence,  we  have  succeeded  In  Identifying  the  'core'  of  a  high  -order 
problem  where  the  IXls  actually  Interact,  and  a  pair  of  low-order  control 
problems,  one  for  each  DM.  This  has  been  achieved  by  restricting  the  admls- 


Nil 

Ni3“ 

.**13 

”33. 

sible  strategy  sets  of  the  DMs  to  Structure-Preserving  strategies  under  the 
FPS  information  pattern;  and  transforming  the  state  space  and  input  space  ap 
propriately. 

* 

Notice  that  is  independ^t  of  the  statistics  of  the  initial 

^  '/c 

conditions  since  it  is  obtained  from  (6.24a)  and  (6.25).  But  F^^; 

i  »  1,2]  do  depend,  in  general,  on  the  statistics  of  the  initial  conditions, 
as  they  are  obtained  from  the  coupled  set  of  equations  (6.22c-d),  (6.23), 
(6.24),  which  may  be  difficult  to  solve  in  practice.  The  gain  matrices 
f^i3’  ^3i’  ^  which  result  in  “  0>  !•  *  ^>2}  *re  of  particular 

interest,  as  they  are  computationally  simpler  to  obtain.  Such  a  set  is 
given  by 


(6.27 


K]  f  ®i3“33^'^  ;  1,  j  -  1,2  ;  i  j 


(6.27 


Here  satisfy  the  coupled  set  of  equations  (for  i,j  ■  1,2;  i  j). 


*  «I3’*33  *  '^1X3’+  =ii^l3  - 

-  ■ 


0 


(6.28i 
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M33  Ajj  +  A23  M23  +  Mj^3  A^3  +  Aj^3M^3  +  Cj^3Cj^3  M33  Sj^3M33  M33  S^Mj^3 

^(1)*-  (1)  „(i)_  ^(J).  M(j)c  m(^>- 

-  Mj^3  S^M23  M33  S^3M33  M33  SJ3M33  M33  S^M^3  Mj3  SJM33 

**13  ®  11^13  “i3  ®i4“i3  **13  ®jl“33 


«a)s’  M<^>-  M<^>s  -  0 

**33  ®ji“i3  “i3  ^Ji“j3  “j3  ®ji“i3 


(6.28b) 


where. 


®ii  "  ®ii^li®li’  ^13  “  ®i3^i^i3*  ®ij  “  ®ij\j^i3 


A  ^  ^  ^  ^  g  A  A 

^14  “  ®i4\j®i4’  ^IJ  “  ®ij\j®i4*  *  ®i4\j®i3 


Furthermore  CFj^3,  F3^;  t  -  1,2]  are  such  that 

^13^3  +  N^3  "  0  J  i  "  1»2  J  (^-29) 

where  L33  is  the  positive  definite  solution  of  (6.23b}. 

Notice  that  if  the  initial  cross-covariance  N^3  *  N23  *  0,  then  (6.29) 
is  satisfied  if  and  only  if  4*3  -  A23  ■  0;  which  would  be  true  if  the  solu¬ 
tion  of  (6.27)  and  (6.28)  bIock«diagonallzes  the  closed-loop  system. 


6.3.2.  FIS  information  pattern 

It  can  be  readily  seen  that  when  the  output  matrices  are  of  the  form 
given  by  (6.5b)  Structure-Preserving  strategies  involving  only  static  linear 
output  feedback  do  not  exist. 


when  Che  output  matrices  split  so  chat  there  are  separate  observation 


channels  for  the  individually  and  commonly  obseirvable  modes,  i.e.,  when 


^ii®il  ^ii^i2 


'i3 


;  i  -  1,2  ; 


linear  static  output  feedback  Structure-Preserving  strategies  do  exist, 
belong  to  the  admissible  strategy  set  defined  by. 


?!  -  [  Y^lvjCyj,)  -  -  fjy. 


ii 

0 


i3 


3i 


};  1  -  1,2 


Substituting  u^  «  '^’i^^i^  from  (6.31)  in  (6.7)  and  (6.8),  we  get 


X  ■  Ax  ;  x(0)  =  X 


CiX 


00 

T  i  -  1,2 

^  rt  i 


where 


-  C^(I  +  ;  i  -  1,2  ; 


and  the  closed -loop  system  matrix  becomes  , 


(6.30) 

and 

(6.31) 

(6.32a) 

(6.32b) 

(6.33) 


(6.34) 


(6.35) 

Define  M2  end  L  as  in  (6.16),  (6.17),  and  partition  them  as  in  (6.20).  For  any 
given  pair  (rj^,F2),  auch  that  Re  X(A)  <  0,  >  0  and  L  >  0  satisfy  the 

matrix  Lyapunov  equations 

Mj^A  +  a'Mj^  +  -  0  ;  i  -  1,2  .  (6.36) 

AL  +  IA'  +  N  -  0  .  (6.37) 

Applying  the  Matrix  Minimum  Principle  [59],  the  optimal  for  the 

Feedback  Nash  solution  can  be  shovn  to  satisfy  (for  i,J  •  1,2;  i  J) 
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^13^13^3^11  “  ^ll®ll^^il^Hl^ll '‘■“L^^13^lJ  (6.38a)  ij 


^11^11^3^13  ■*■  ^13^13^33^13  “  ^ll®ll^^il^H3^13 ''■  **13^^33^13^  (6.38b) 


^31  “  ^ljf®13**L^S3^13  ■‘'  ®13“i3^H3^13  ''*  ®l4“il^^l3^13  ■*■  ®l4^i3^^33^13^ 


X  [Cj^3L23C^3l 


(6.38c) 


where 


«(l-)r*  ,  m(^^a^*  j.  r*'M(^^j.  r^'  fT  4.  p*'r  p* 

M33  A33  +  A33M33  +  Mj^3  Aj^3  +  Aj^3Mj^3  +  Cj^3(I  +  Fj^3Rj^jlFj^3 


(6.39a) 


Ki(^  *  ^*iiv*ii>^ii 


(6.39b) 


^11^^13  ■*■  ^13^^33  ^il^L^"''  ^11^^  ^11^11^13^^13  "  ° 


(6.39c)  ^ 


■'■  ^31^1j^31^^13 


(6.39d) 


^11^11  ■*■  ^11^11  '*■  ^13^13  ■*■  ^13^13  ■*■  **11  “  ° 


(6.40a) 


^11^13  ■*■  ^13^3  ''■  ^13^33  ■'■  **13  "  ° 


(6.40b) 


1*3133  +  L33A33  +  Njj  -  0 


(6.40c), 
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A*2  ;  1-1,2}  areas  in  (6.35)  with  ^3^  " 

p*i  j  1  a  1,2}.  The  following  proposition  highlights  the  multimodel  nature 

of  the  Nash  solution. 

Proposition  6.2; 

Given  the  linear  syston  (6.7a)  controlled  by  two  DMs,  their  observa¬ 
tion  sets  (6.30),  and  their  performance  indices  (6.8),  the  design  of  Structure- 
Preserving  Feedback  Nash  strategies  under  the  FIS  information  pattern  leads 
to  two  Iow>order  coupled  optimization  problems  defined  by 


1  “ 

min  -  E  C  2  ^^^1  } 


subject  to 


*10  ° 


^11  ^13 


where 


^ii  ° 


^1  (  ^ 


^11  ^i3 


^33  “®J3^3j^j3 


»ii  ®14 


z^(0)  - 


"lo' 


l.j  -  1.2;  1  +  J 


The  solution  to  this  pair  of  coupled  optimization  problems  is  given  by  the 
set  of  equations  (6. 38)- (6.40). 

Now,  unlike  the  earlier  problem  of  Section  (6.31),  we  need  to  verify  that 


Re  X<A^i)  <  0  ;  i  -  1,2,3 

for  the  solution  to  be  well-defined.  Also  unlike  the  earlier  case,  the  Struc¬ 
ture-Preserving  Feedback  Nash  solution  of  Proposition  6.2  is  completely  inter¬ 
acting.  This  is  essentially  due  to  the  fact  that  equations  (6.38a)  and  (6.38b) 
cannot  be  solved  explicitly  for  and  Another  significant  difference 

is  that  now  all  the  optimal  gains  {F^^^,  F^^,  F^^;  i  >  1,2}  depend  on  the 
statistics  of  the  Initial  conditions. 

Partial  noninteraction  results  when  «  0;  i  >  1,2}.  In  this  case  the 

optimal  solution  is  given  by  (i,j  *  1,2;  i  J) 


^ii® ll“il^  ^il^ii ^^li^ii^ii^ 


(6.41a) 


^li®li“l3^  ^33^13  ^^13^3^13^ 


(6.41b) 


M^®13**33^ ^33^13  ®  14*^13^^33^13^  ^^i3*'33^l3^ 


(6.41c) 
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^ti^»  **13^*  **33^  obtained  from  (6.39)  with  the  control  gains  given  by  (6.41) 
1^2  is  obtained  from  (6.40c),  and  is  the  positive  definite  solution  of 


^iihl  +  hi^li  +  «il  -  ° 


(6.42) 


Furthermore  Cf*3»  ^3^^;  i  ■  i»2}  are  such  that 


^13^33  *^13  “  ®  ^ 


(6.43) 


Mow  F^^  is  first  obtained  by  each  DM  independently  on  solving  equations  (6.39b), 
(6.41a)  and  (6.42).  This  is  the  optimal  solution  of  an  output  regulator  problem 

M  A  M  A 

with  parameters  (58]. 

In  cases  when  the  output  matrices  do  not  split  as  in  (6.30Xthe  FPS  Struc¬ 
ture-Preserving  Nash  strategies  of  Proposition  6.1  can  be  synthesized  as  feed¬ 
back  strategies  using  dynamic  observers. 

We  let. 


“i  "  Yi< V 


where. 


z^;  i  -  1,2 


(6.44) 


^il  ^13'®J1^3j 


^33  ■®J3^3J 


A 

*1*^ 


»ii  ®14 


“i  ^l^^i  ‘  ^^il  ^13^*1^ 


i,J  -  1,2;  i  ^  j  (6.45) 


1  ■  1,2}  are  given  by  eqtiatlons  (6.24) ,  and  is  the  observer 
gain  to  be  chosen  by  each  m. 

Notice  that  the  dimension  of  the  observer  of  each  DM  is  equal  to  the 
dimension  of  his  ovn  observable  elgenspace,  which  is  all  be  needs  to  recon¬ 
struct  in  order  to  implement  the  FFS  Structure-Preserving  strategy. 

Defining  Set  , 


^ii  ■  *^il^ii  ^13  ’  ®Ji^3j  "  ^11^13 


“‘^12^11 


^33  ■  ®j3^3j  “ 


®i  ’  h 


-  A^e^  j  i,j  -  1,2  j  i  ,4  j 

If  we  choose  such  that 

Re  \(A^)  <  -  >  0  ;  i  -  1,2 

then  we  can  write. 


Re  X(A^)  <  -  1  ;  i  -  1,2 


(6.46) 


(6.47) 


(6.48) 


Hence,  by  making  the  observer  dynamics  arbitrarily  fast,  we  can  represent  the 
error  system  as  a  stable  singularly  perturbed  system  l.e.;  e^-'O  as  ->  0. 
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Rewriting  the  composite  system  end  the  feedback  strategies  as. 


X  -  Ax  +  BjSj 


^1«1  -  A^«i  ;  1  -  1,2  . 


(6.49) 


Ui-Yi(ac.ei)  -  - 


•  »u‘n 

^11*12  ^13“ 

r*F* 

'ii 

-  . 

X  + 

.  0 

0  ^3lJ 

_0 

4- 

1*1*  ^ 


1,2.  (6,50) 


Since  e^  -•  0  as  -•  0,  converges  In  open-loop  to  a  policy  having  a 


-f 


unique  feedback  representation,  which  we  denote  by 


Yf(i)  -  yJ[(x);  1  -  1,2 


where  Cy^(x):  1  >  1,2}  Is  the  FPS  Structure-Preserving  Feedback  Nash  strategy 
of  Proposition  6.L. 

Due  to  (6.51)  and  the  results  of  [25],  we  have 


(6.51) 


♦ 


Xla  JiCVi.Yj)  -  JtCVi.Yi)  i  ‘  •  1-2 


(6.52) 


It  Is  to  be  noted  that  (6.50)  Is  not  the  Feedback  Nash  strategy  for  .the  system 
(6.49)  and  the  performance  Indices  (6.8)  within  the  class  of  admissible 

A 

strategies  defined  by 


Ti  -  (Yi  I  Y4(5.*i) 


F.jS,,  F.,6j 


rF 


11  '13 


F 


31-* 


F, 


13 

’^11  ^13" 

X  + 

31. 

0  F 

L  3lJ 

2^;  fixed  }  ;  1  -  1,2  (6.53) 


L 


^4 


r- 
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The  Feedback  Nash  strategy  will  in  general  depend  on  the  choice  of  the 

observer  gains  We  do  not  compute  because  the  strategy  given  by  (6.44), 

or  equivalently  by  (6.50),  has  the  property  of  being  near-equilibrium  and 
asymptotic  Nash  [20]  as  established  by  the  following  proposition. 

Proposition  6.3; 

The  strategy  'Y£(*j^)  ■  given  by  (6.44)  (or  (6.50))  is  near- 

equilibrium  and  asymptotic  Nash  within  the  class  defined  by  (6.53).  That  is, 

iim  {J  (Y.,Y.)  -  JAy,.y.)]  -  0;  V  ^  €  f . ;  i,j  -  1,2;  i  ^  j 
!!m-!I-o  j  ^  j-  j  1  i 


and, 

■  Ji<Vi.Vj)}  -  0  : 

V  Yj  ^  Tj  such  that  JjCY^.Yj)  <  JjCY^.Yj)  ;  iJ  *  1,2  ;  i  j*  j. 

The  proof  of  the  above  Proposition  follows  readily  from  the  results  of 
Chapter  2 . 


■4 


".1 

J 


. '  1 

.■5 


6.4.  Decoupling  of  Completely  Observable  Systems 

In  situations  when  the  whole  system  is  completely  observable  through  the 
observation  set  of  each  DM,  the  'core*  is  the  full  problem  itself.  But  in 
some  such  cases,  if  the  DMs  have  access  to  all  the  states  then  the  observa¬ 
bility  decomposition  can  be  induced  by  using  state  feedback.  The  role  of  the 
decoupling  control  in  reduced-order  modeling  has  been  studied  in  detail  in 
[62].  Here  we  shall  outline  the  procedure  for  multiple  DM  problems. 


'  a 


r-:' 

:■  ■  '1 

*  ‘ i 


Suppose  after  appropriate  state  space.  Input  space  and  output  space 
transformations,  the  system  can  be  put  In  the  following  form  [62], 


^11 

^12 

^13 

^1 

^14“ 

”0 

^21“ 

^21 

^22 

^23 

X  + 

0 

h2 

®22 

®24 

^31 

^32 

A33 

0 

0 

»23_ 

^■'<=11  “  =13'* 

^2  ■  '  °  =22  =23  ’  * 

with  1  -  1,2}  b*lng  square  and  nonsingular. 

Now  If  the  DMs  use  the  following  strategies. 


13^31 


‘®11^^12  '  *21*23^32^ 


-822(^21  *  ®12®13^31^ 


X  +  u. 


(6.5 


•^23^32  ° 


X  +  u„ 


(6.5 


then  the  resulting  partially  closed'-loop  system  has  the  form. 


^11  ‘  ®14®13^31 


0 


0 
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X  ■ 


0 


“^2^23^2 


X  + 


0 


"1+ 


0 


0 


(6.56) 


It  can  be  readily  seen  that  the  system  (6.56),  (6.54b)  has  the  desired  form 
of  (6.7).  Under  appropriate  assumptions.  Proposition  6.1  can  be  applied  to 

A  A 

design  as  FPS  Structure-Preserving  strategies. 

It  is  significant  to  note  that  making  the  dimension  of  as  large  as 
possible  results  in  a  'maximally-decoupled*  system  i.e.;  a  system  in  which 
the  decentralized  control  problems  are  of  the  highest  possible  dimension,  and 
consequently  the  'core*  problem  is  of  lowest  possible  dimension  [62]. 

The  use  of  decoupling  control  Introduces  a  degree  of  suboptimality  if 
the  performance  indices  are  chosen  a  priori.  This  is  because  the  decoupling 
control  is  chosen  from  a  purely  algebraic  point  of  view  without  any  optimality 
considerations . 

We  would  like  to  remark  that  the  use  of  decoupling  control  requires  a 
degree  of  mutual  cooperation  among  the  DMs.  This  cannot  be  guarenteed  under 
the  noncooperative  Nash  concept  in  general,  unless,  the  resulting  advantages 
constitute  a  strong  enough  incentive  for  the  DMs  to  compensate  for  the  per¬ 
formance  loss  resulting  from  the  use  of  decoupling  control.  But,  within  a 
cooperative  framework,  the  use  of  decoupling  control  can  be  readily  ensured. 
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In  problems  i^ere  there  is  a  need  for  the  DMs  to  use  the  decoupling 
control.  It  will  be  more  appropriate  for  them  to  choose  their  performance 

A 

Indices  with  respect  to  the  strategies  u^  after  the  decoupling  has  been 
achieved.  Again,  this  Is  easier  to  ensure  In  a  cooperative  framework  than 
In  a  noncooperative  framework. 

Hence,  In  situations  when  the  decoupling  control  has  to  be  used, 
a  semicooperative  or  cooperative  framework  Is  desirable  for  the  application 
of  our  techniques. 


6.5.  Extensions 

In  this  section  we  shall  discuss  briefly,  extensions  of  our  Ideas 
to  many  DM  problems  and  cooperative  Pareto  games. 


6.5.1.  Many  decision  maker  problems 


In  situations  with  more  than  two  DMs  there  Is  more  than  one  way  to  approach 
the  problem;  each  approach  resulting  In  a  different  order  of  simplification. 
Ideally  one  would  like  to  Identify  the  individually  observable  modes,  the  pair¬ 
wise  observable  modes  and  so  on;  and  overlap  appropriately  Che  input  structure 
of  each  IM  with  this  observability  decomposition.  The  design  of  Structure -Pre¬ 
serving  Nash  strategies  would  Chen  lead  to  Che  solution  of  low-order  control 
problems,  problems  where  two  DMs  interact,  problems  where  three  DMs  interact 
and  so  on  up  Co  the  core  problem  \diere  all  the  DMs  interact. 

In  the  three  DM  case  Che  (A,Cj^,C2»C^)  matrices  in  the  observability  decom¬ 


position  form  will  look  like 
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^11 

0 

0 

S4 

•  °  <^16 

^17 

0 

^22 

0 

S4 

‘25  ® 

S7 

1 

0 

0 

S3 

0 

*35  *36 

S7 

A  ■ 

0 

0 

0 

^44 

0  0 

S7 

0 

0 

0 

*55  “ 

S7 

1 

0 

0 

0 

0 

°  *66 

S7 

__  0 

0 

0 

0 

0  0 

^77. 

"l  - 

'  <=11 

0 

0 

S4 

0 

<=16  <=!?' 

S  - 

[ 

0 

^2 

0 

S4 

Ss 

0  C„1 

s  - 

c 

0 

0  c 

33 

0 

Ss 

S6  <=37’ 

It  can  be  readily  seen  that  the  nuaber  of  blocks  to  be  Identified  in  the  sys¬ 
tem  matrices  grows  exponentially  as  the  number  of  DMs  Increase.  Hence  for  a 
large  number  of  DMs  such  a  decomposition  niay  be  difficult  to  achieve  in  prac¬ 
tice.  The  other  extreme  would  be  to  Identify  only  the  modes  observable  by 
each  DM  alone,  and  consider  the  rest  as  commonly  observable  modes.  This  will 
result  in  only  a  first  order  of  simplification  because  the  core  problem  will 
be  of  a  higher  dimension.  Of  course  in  practice,  depending  on  the  problem, 
any  approach  in  between  these  two  extremes  may  be  adopted,  resulting  in  dif¬ 
ferent  orders  of  simplification. 


6.5.2.  Pareto  games 

Multioodel  solutions  to  cooperative  Pareto  games  based  on  the  structural 
decompositions  of  Section  6.2  can  be  obtained  In  a  straightforward  manner.  To 
Illustrate  this  point  we  shall  give  below  the  Structure-Preserving  Pareto 
strategies  under  the  FPS  Information  pattern. 

Define  the  overall  system  cost  as 
2 

J  -  ;  0  <  Qf^  <  1  ;  -  1  (6.57) 

Applying  the  Matrix  Minimum  Principle,  the  FPS  Structure-Preserving  Pareto 
strategy  ,  defined  by  (6.9)  for  the  system  (6.7)  and  performance 

Index  (6.57)  Is  obtained  as  (for  1  »  1,2), 


Ki 


(6.58a) 


Kl  ^13 


(6.58b) 


31 


^  —  1  ^  f  1 

"  ®ij  ®i3  ^  a  “33 


"ll'‘13''33  ^ 


(6.58c) 


ihere 


^ii^ii  ^11®*!!  ^ii^li  " 


^11^13  **13^33  '*■  ^11^13  ®il®ii®i3  "  ° 


^3^33  ^33^33  ■*“  *1^13^13  *  ^i3**13  ^13^13  ^i3®ii®i3^ 


+  “i<®12®31  +  “2^52^1^* 


32 


(6.59a) 

(6.59b) 


0 


(6.59c: 
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A*  L 


A*  L 
^13*^33 


^13^33 


+  N 


i3 


(6.60a)  " 


^33^33  ■*■  ^33^33  ■*■  ^33  *  °  (6.60b) 

The  controllability-observability  of  the  triple  [(A^^^,  Bii,Cii);  i  =  1,2}  guar¬ 
antees  {Re  \(A*^)  <  0;  i  =  1,2].  For  the  solution  to  be  well-defined  we  need 
only  to  verify  that  Re  \(A22)^  0*  The  solution  given  by  (6.58)-(6.60)  has  fea¬ 
tures  similar  to  the  Nash  problem  of  Section  6.3.1  (like  partial  noninteraction). 

The  Structure-Preserving  Pareto  strategies  under  the  FIS  information  can 
be  obtained  in  a  similar  manner.  The  solution  will  have  features  similar  to 
the  Nash  problem  of  Section  6.3.2. 


6.6  Applications 

Now  we  shall  examine  the  applicability  of  our  design  methodologies  to  the 
control  of  large  scale  interconnected  subsystems  and  multiarea  power  systems. 

6.6.1.  Large  scale  interconnected  subsystems 

Consider  the  large  scale  system  wherein  each  subsystem  is  controlled  by 
one  DM  having  his  own  performance  objective.  The  system  considered  is  of  the 


-  A^^x^+ 


•N 

r 

j-i 


form 


(6.61a) 


where  the  output  variables  are  the  late rcoanect Ion  variables.  The  above 
problem  has  been  considered  In  [55]  as  a  single  DM  problem.  We  shall  demon¬ 
strate  that  when  viewed  as  a  multiple  m  problem,  the  techniques  developed  In 
this  paper  can  be  applied  for  optimal  strategy  design. 

For  sl]iq)llclty  we  shall  consider  the  two  subsystem  case  (N«2).  As  In 
[55]  suppose  that  each  subsystem  Is  transformed  with  respect  to  Its  own  output 
The  transformed  system  can  be  represented  as 


By  a  simple  reordering  of  variables  (6.62)  can  be  written  as, 

s.!  r-s’  I  •  I  -S’  -in  M  r-si  r 


'21  *2i 


pd)  pl2 

Fjl  r 


+  Uj^+  -«-« 


“2  (6.63) 


f(2)  ^ 

*11  ’1 


Row,  by  making  an  appropriate  Input  space  transformation [55]  and  letting  DMl  use 
his  own  residual  state  feedback  to  cancel  the  terms  F^^^^lr*  obtain  a  system 
which  Is  In  the  familiar  observability  decomposition  form.  The  Interconnection 


variables  represent  the  variables  observable  by  both  the  DMs,  and  the  resid 
ual  variables  represent  the  variables  observable  by  DMi  alone. 

Suppose  DMi  chooses  his  performance  index  as  , 

00 

Ji  •  ^^i^i  ^ir^i*ir  ;  i  -  1,2  (6.64) 

A 

where,  «  decoupling  control  +  u^  then,  assuming  that  each  DM  has  access  to 
all  the  interconnection  variables  and  his  own  subsystem  variables,  Structure - 
Preserving  linear  Feedback  Nash  strategies  u^  can  be  generated  from  multimodel 
solutions  of  Proposition  6.1. 

6,6.2.  Two-area  power  system 

This  example  has  been  considered  in  [55]  in  the  single  DM  context. 

Here  we  shall  assume  that  each  area  is  under  a  different  control  authority. 

We  shall  first  transform  the  system  into  our  desired  form  given  by  (6.7), 
and  then  obtain  Pareto  strategies  on  solving  equations  (6.58)-(6.60) . 

A  two-area  power  system  with  each  area  containing  two  thermal 
plants  is  constructed  from  [60].  The  system  is  modeled  by 


X  •  Ax  +  Bj^Uj^  +  B2U2 

(6.65) 

^i  "  J  ^ 

192222 

where  x  €  R  ,  Uj^  €  R  ,  U2  €  R  ,  yj^  €  R  ,  y2  6  R  .  The  state,  control  and  out 
put  variables  are  defined  in  Appendix  D. 

The  system  matrices  are  given  by, 


,2 


In  (6.66).  The  decoupling  control  is  chosen  to  be. 


1^ 


U2  °  °  -0.716  1.168]  Xj^  +  u 

Substituting  (6.67)  In  (6.66)  we  obtain. 


where 


(1)  -(2) 
11  “  ^11 


-5  4.75  0 

-2.864  2.612  0 

0  0-2 

0  0  0 


0  0  0 

0  -2.864  4.612 

2  0  0 

0.167  0.167  0 
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Now  the  system  is  precisely  in  a  form  suitable  for  our  design  techniques. 
The  frequency  deviations  in  the  two  areas  and  the  tie-line  power  flow 
comprise  part  of  the  core  variables  x^.  The  variables  ^Lve  the 

residual  variables  associated  with  each  area. 

The  nineteenth-order  game  in  its  original  form  (6.65)  may  be 
computationally  intractable.  But  in  the  form  (6.68),  and  allowing  only 
FPS  Structure-Preserving  strategies,  we  need  only  to  solve  two  sixth-order 
optimal  control  problems,  and  one  seventh-order  problem  where  the  two 
DMs  interact. 

For  the  Pareto-optimal  design,  the  cost  functionals  are  chosen 

to  be 


<^1  “  2  J*o  ^i^il*i  *3^13^3  “i^i“i^ 


i=l,2  . 


with 


-  diag  (10,  10,  10,  10,  1,  1) 


<22 


diag  (12,  15,  10,  5,  5,  5) 


Q^3  -  diag  (10,  7,  0,  0,  0,  0,  0) 

Q23  -  diag  (0,  5,  10,  0,  0,  0,  0,  0) 

Rj^  -  diag  (10,  25) 

A 

R2  “  diag  (5,  20)  ;  Cov 


(x^  -  N  -  I. 


2  3 

Case  1;  Pareto  cost  J  “  3  +  J  *^2 


The  optimal  gains  F^^  are  first  obtained  from  optimal  control 
problems  (6.58a),  (6.59a) 


Fii  -  (-0.167  -0.722  0.0132  0.571  0.043  0.13] 
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^22  “  -1.082  0.017  0.528  0.042  0.714]. 


*  * 


Then  the  optimal  gains  are  obtained  from  the  coupled 

equations  (6.58b,c),  (6.S9b,c)  and  (6.60), 
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’31  " 
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-16.44 

-0.653 

-0.068 

0 

0] 

* 

‘32  “ 

(-17.08 

0.568 

16.33 

0 

0 

-8.21 

-0.121] 

The  closed-loop  eigenvalues  turn  out  to  be  -0.2^0.51,  -0.24+10.48. 
-0.39+j0.05,  -0.52+j0.07,  -1.0^1. 5,  -1.99,  -2,  -2.09,  -2.21,  -2.21, 
-5.1^1.92,  -7.09±jl.96. 

The  feedback  strategies  are  obtained  as. 
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1 

0.314 
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-0.144 

-1.082 

0.402 

-0.197 

0.519 

0.811 

-0.027 

0.776 

-1.168 

0.716 

-0.057 

0.037 

-0.714 

-0.042 

-0.528 

-0.017 
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-0.716 

0 

0 

Case  2;  Pareto  cost  J  -  0.1  +  0.9 


The  gains  do  not  change  and  remain  the  same  as  before.  The 
gains  obtained  as 
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13 

[-6.72 

-18.33 

4.47 

* 

23  “ 

[-5.91 

-19.38 

3.72 

* 

m 

31 

[21.26 

4.715 

-20.38 

* 

'32  " 

[-15.15 

0.481 

14.77 

0  -13.08  •  3.17] 


F32  -  [-15.15  0.481  14.77  0  0  -0.514  -0.131]  . 

The  closed-loop  eigenvalues  turn  out  to  be  -O.l^jO.56,  -0.172,  -0.39+j0.05 
-0.52+j0.07,  -0.61,  -1.03+jl.l5,  -1.99,  -2.0,  -2.21,  -2.21,  -3.12, 
-5.18+jl.92,  -7.0^1.96. 

The  feedback  strategies  are  obtained  as. 


^  -0.722  0.167  0,412 

•k 
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•0.571  -0.013 

0  0 


-0.111 


0.884  ; 
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m  lo. 


.198  0.812 

.626  -0.012 


-0.133  -1.082  0.402  -O.lSfi  0.363 

0.494  -1.168  ' 
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n 

Case  3:  Pareto  cost 

remains 

J  «  0.9  Jj^  +  0.1 
* 

t  the  same.  ^^^3, 

•^2 

F31  are 

obtained 

as. 

F*3  -  1-2.82 

-13.19  1.11 

13.73 

-3.88 

0  0] 

F23  -  (-7.75 

-25.62  6.51 

0 

0 

-11.21 

5.34] 

a. 

F*^  -  (13.91 

0.366  -14.48 

-0.489 

-0.041 

0  0] 

F32  -  (-18.19 

0.614  17.56 

0 

0 

-1.112 

-0.291]. 

The  cloaed'loop  eigenvalues  turn  out  to  be  -0.1^0.66,  -0.158,  -0.39^0.05, 
-0.52+J0.07,  -0.542,  -0.9aryl.52,  -1.99,  -2.16,  -2.21,  -2.21,  -5.18ijl.92, 
-7.0^1.96. 

The  feedback  strategies  are  obtained  as  , 
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Notice  that  the  strategy  of  each  DM  requires  a  knowledge  of  the 
states  of  his  own  area  and  only  the  frequency  deviation  of  the  other  area; 
a  feature  desirable  from  Implementation  polnt>of-vlew.  The  time  responses 
of  the  tle-llne  power  flow  and  frequency  deviations  of  the  two  areas  are 
plotted  in  Figures  6. 1-6. 3.  It  can  be  seen  that  the  response  of  the 
frequency  deviation  corresponding  to  the  area  weighted  lightly  In  the 
Pareto  cost  Is  more  oscillatory,  which  is  what  one  would  expect.  The 
response  of  the  tie-line  power  flow  does  not  change  significantly  in  the 
three  cases. 

6.7.  Cone lus Ions 

In  this  chapter  we  have  examined  the  role  of  the  observability 
structure  In  multiple  decision  maker  problems.  By  Identifying  explicitly 
the  observability  decomposition  induced  by  the  observation  sets  of  the 
DMs,  and  by  overlapping  appropriately  the  Input  structure  of  each  DM, 
we  have  shown  that  the  design  of  Structure- Preserving  Feedback  Nash 
strategies  leads  to  multimodel  solutions.  Under  the  FPS  information 
pattern,  the  multimodel  solutions  are  shown  to  admit  partial  noninteraction 
among  the  DMs.  Under  the  FIS  Information  pattern,  Structure-Preserving 
strategies  involving  only  linear  static  output  feedback  do  not  exist  In 
general.  When  the  output  matrices  split  so  that  there  are  separate 
observation  channels  for  the  individually  and  commonly  observable  modes. 
Structure -Preserving  strategies  do  exist  and  are  again  generated  from 
multimodel  solutions.  But  In  this  case,  the  solution  is  completely 
Interacting  unless  certain  conditions  on  the  statistics  of  the  state  variables 
are  satisfied.  When  the  output  matrices  do  not  split,  the  FPS  Structure- 


Preserving  strategies  can  be  synthesized  using  observers  with  arbitrarily 
fast  dynamics.  This  strategy  has  the  property  of  being  near-equl librium 
and  asymptotic  Nash.  When  the  system  Is  completely  observable  by  each 
DM,  the  observability  decomposition  can  be  Induced  by  using  the  decoupling 
controls.  But  In  such  situations,  a  semi-cooperative  or  cooperative 
framework  Is  desirable. 

Applications  to  the  control  of  large  scale  Interconnected  subsystems 
and  control  of  multlarea  power  systems  have  been  examined;  and  extensions 
to  many  DM  problems  and  cooperative  Pareto  games  have  been  discussed. 


CHAPTER  7 


CONCLUSIONS 

The  main  thrust  of  this  thesis  has  been  towards  analyzing  the 
Interaction  between  model  sliiq>lification  and  control  strategy  design  in  a 
nultimodel  context.  We  have  studied  several  realistic  situations  which 
allow  the  decision  makers  to  use  different  simplified  models  of  the  system. 

In  Chapters  2-4,  we  have  established  the  well-posedness  of 
multimodel  generation  by  'k-th  parameter  perturbation'  for  classes  of 
linear  multiparameter  singularly  perturbed  systems.  In  Chapter  2  we  have 
considered  deterministic  models  without  the  weak-coupling  assumption  on 
the  fast  subsystems  and  obtained  near-optimal  decentralized  strategies 
from  multiple  noncausal  reduced-order  models.  In  Chapters  3  and  4  we 
have  considered  stochastic  version  of  the  model  considered  in  [15,16] 
with  decentralized  observations  for  the  decision  makers.  In  Chapter  3 
we  developed  multimodel  solutions  for  a  Nash  game  with  prespecified 
finite-dimensional  compensator  structure  for  each  decision  maker.  In 
Chapter  4  we  developed  multimodel  solutions  for  team  problems  with 
sampled  observations  for  the  decision  makers.  Both  the  static  team 
problem  and  the  dynamic  team  problem  with  one-step-delay  observation- 
sharing  pattern  have  been  considered. 

In  Chapter  5  we  have  considered  the  average -cost -per -stage 
problem  for  finite-state  Markov  chains.  The  focus  was  on  obtaining  near- 
optimal  incentive  policies  for  controlled  Markov  models  consisting  of  N 
weakly-coupled  groups  of  strongly-interacting  states.  A  hierarchical 


algorithm,  which  allowed  for  multimodeling  on  the  part  of  the  'local' 
decision  makers,  has  been  proposed  for  computing  the  near-optimal  incentive 
policies. 

In  Chapter  6  we  have  taken  an  aggregation-based  approach  to 
multimodeling.  Based  on  input-output  considerations,  we  restructured 
the  problem  in  such  a  way  that  the  optimal  solution  within  a  class  of 
admissible  strategies  (defined  as  Structure-Preserving  strategies)  could 
be  obtained  from  multiple  reduced-order  models.  In  some  cases,  the 
solution  has  the  desirable  feature  of  partial  noninteraction  among  the 
decision  makers. 

The  main  contribution  of  this  thesis  has  been  towards  strengthening 
and  extending  the  multimodeling  concept  beyond  the  framework  within  which 
it  was  originally  introduced  in  [14,15].  We  have  achieved  this  by 
examining  three  different  approaches  to  multimodeling.  The  first  approach 
(same  as  in  [14,15])  has  been  to  establish  the  validity  of  a  rational 
multimodel  generation  scheme  which  is  chosen  a-priorl.  The  results  of 
Chapters  2-4  have  strengthened  this  approach  by  establishing  the 
'robustness'  of  multimodel  generation  by  'k-th  parameter  perturbation  ' 
proposed  in  [15],  to  a  class  of  solution  concepts  and  information  patterns. 
The  next  two  approaches  have  extended  the  multimodeling  concept  beyond  the 
framework  of  [14,15].  The  second  approach,  taken  in  Chapter  5,  has  been  to 
develop  a  numerical  algorithm  for  computing  near-optimal  policies,  which 
allows  the  decision  makers  to  use  multiple  reduced-order  models.  The  final 
approach,  taken  in  Chapter  6,  has  been  to  induce  multimodel  solutions  by 
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an  appropriate  restructuring  of  the  problem  and  a  suitable  choice  of 
admissible  strategies.  The  results  of  this  thesis  have  revealed  the 
Interplay  between  model  simplification  tools  like  time-scales,  weak- 
coupllng,  controllability-observability,  and  strategy  design  concepts  like 
team,  Nash  and  Stackelberg. 

There  are  many  possible  directions  for  further  research  along  the 
lines  of  the  results  obtained  In  this  thesis.  For  the  models  considered 
In  Chapters  2-4,  Stackelberg  problems  with  dynamic  Information  (wlth/wlthout 
memory)  for  the  leader  [64]  can  be  analyzed.  Also  multimodeling  possibilities 
can  be  explored  for  nonlinear  deterministic  and  stochastic  models  of  the  type 
considered  In  [67,68].  For  Markovian  models  considered  In  Chapter  5,  It 
will  be  rather  straightforward  to  analyze  the  finite  horizon  and  Infinite 
horizon  discounted  cost  problems  with  state  information.  A  nontrivial 
extension  would  be  to  problems  with  decentralized  Imperfect  information  for 
the  decision  makers  [66].  In  the  aggregation-based  approach  of  Chapter  6, 
we  have  assumed  an  'exact'  system  decomposition.  A  possibly  more  practical 
problem  would  be  to  consider  situations  when  there  Is  only  a  'weak' 
decomposition  of  the  system.  A  perturbational  decomposition-aggregation 
approach  could  be  developed  to  obtain  near-optimal  policies  for  such 
problems. 

Possibilities  for  a  multimodel  design  approach  based  on 
overlapping  decompositions  [69]  and  state  vector  partitioning  [70]  can 
also  be  explored. 
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APPENDIX  A 

MATRIX  DEFINITIONS  APPEARING  IN  CHAPTER  2 
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MATRIX  DEFINITIONS  APPEARING  IN  CHAPTER  3 
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PROOF  OF  PROPOSITION  4.1 

The  unique  optimal  solution  to  the  static  team  problem  defined  by 
(4.IIa),  (4.14)  and  (4.12)  Is  given  by  [36], 
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m  '  !■!  ®  ool  111  ii  11  ' 


{CIZ) 


a'-  ^(t)  -  P,C, -B'S(L. +L,)  +  3B!S$(t,t  );  i,j  =  l,2;  i#j 
O  llllj  1.0 


-  P]^  +  B^SV^;  i=l,2  (Cl3) 

Ap\t)  -  B^SV^;  i,j-l,2;  (C14) 

-  AV^  +  B^B^S^[P^-LjZ.l-B^B^K^;  V^(t^)  -  0  i.j  -1,2;.  (C15) 

W  -  AW  +  WA'+FF';  W(t^)  -  0  (C16) 

« 

$  -  A4;  ♦(t  ,t  )  -  I.  (C17) 

o  o 


To  prove  (a)  we  express  S(t),  S^(t),  L^(t),  K^(t),  P^(t)  in  partitioned  form  as 


^00 

®1^01 

®2^02 

^00 

^roi 

E  S^^>  1 
^2^02 

S(t)  - 

^l^il 

®1®11 

;  s^(t)  - 

E  S^^>’ 

^roi 

^1  2^12 

*2^02 

hhz 

M 

E  s(^>' 
^2^02 

E  S^^> 

2  22 

^00 

^01 

h2 

•H 

%P 

1 _ 

rp(^>i 

o 

Ljl^(t)  - 

^10 

hi 

h2 

,  K^(t)  - 

E 

hh 

.  Pj^(t)  - 

p»> 

L^20 

^21 

h2  J 

L®2h  J 

Substituting  these  forms  in  (C4),  expanding  out  and  neglecting  0(11  eI)  terms,  we 
obtain 


^i  "  ^®0i^00  ®ii®Oi  ^^^0  ’^00  ^is"^0i^if^  ^  ®ii^ii  ^^i  Wo  ^is  4i  ^if^ 


b' 

®ii  *“1 


(CIS) 


Substituting  the  partitioned  forms  in  (C5)  and  (C6)-(C8)  and  taking  the 


limit  as  I  e||  **  0  we  get. 
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H 

®00 

- 't.  +  0<1I*I1> 

®01 

"  'l»  *01  *11  *lt  *11  *  *11  ■  *11**11*11®1£ 

-r- 

^ii 

■  *1£  +  °<1I*P 

" 

7(J) 

^00 

-Lj,  +oq|.ii) 

T(J) 

Hi 

-*t(t.v  +oq|.|i) 

7(J) 

^Oi 

-  oqi'lP 

n 

®ij 

-oq|.||) 

'  -• 

4o 

■  ^®l‘*lf  *11^  *11^  *01  *ls  *J»  ®<I1*IP 

1 

7(1) 

-*1.  +o(ll«ll) 

K." 

•v: 

7(i) 

*^i 

•  [Oj  -  Sjj  Ajj  B^jl  Bjj  Sjj  Pjj  +  Oq|c||) 

^(i) 

-Ki,  +oq!.|i) 

Kf> 

**1£®1  'll*!,  +'><ll«ll> 

(C19) 


where 


[A'i  Bjj,  +  ij-l  S^j  Bji  +  (A-J  B^jXaJJ  S^,  B^^l 


i,j-l,2;  1  f  j 

Substituting (C19)  into  (CIS) and  manipulating  terms  we  obtain, 

h-\,  ■  'll  s«  flf  + 


(C20) 


It 

Next^ consider  the  second  term  of  u^(t)  from  (Cl): 

S  X  (t)  -  (Bq^  Sqq  +  s‘^)  Tf^Ct)  +  B^^  T\^(t)  (C21) 

Substituting  the  partitioned  form  of  S(t)  In  (C2)  and  taking  the  limit  as 
||e||  "*  0  we  obtain 


=00  •  +  “<1I‘IP 

^01  *  “oi  'll  'l£  +  “'ll ‘IP 

'll''lf  'lj-“(|l‘ll> 

Using  (C22)  in  (C3)  and  taking  the  limit  as 


^  (C22) 

/ 


-*  0,  It  can  be  shown  that 


Tljlt)  +0q|«|l)  (C23) 

fi(t)  -  fij(t)  +  ill  'll  ('oi  'oo  +  'll  'oi>  °(ll‘ll ) 

Substituting  (C23),(C24)  in  (C21)  and  rearranging  terms  we  obtain 

'i  S  «  (t)  -  B’j  S,  fo^Ct)  +  Bij  Tfij(t)  +  oqi.ll)  (C25) 


Therefore,  (C25)  and  (C20)  imply 

“l<'>  ■  “lm('>  +0<1I‘I1):  I-I.B:  '  6  tV2’ 

Th#  reason  Che  above  approxlnaClon  Is  valid  only  on  a  sublnCacval  Is  because 
we  have  neglected  the  boundary -layer  terms. 

To  prove  (b),  we  need  to  obtain  the  limiting  expressions  for  the 
variables  V^,  W  and  consequently  for  andA^"^^. 
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—1 

K** 

-J 

o 

o 

1 

^01 

1 - 

CM 

O 

s 

v<i) 

''l 

and  W  " 

w* 

^*01 

^11 

Wi2 

v<i> 

w' 

**02 

w' 

“l2 

"1 

C>l 

Substituting  the  partitioned  forms  of  and  W  In  (C15),(CI6)  and  caking 
the  limit  as  ||e||  0  we  obtain 

'  ’'u  +®<II‘P 


+  oq|.|i) 


vf^  +  dl'P 

"oo  •«. +“<11*11  > 

“tl  ■«!  +“<ll*il> 

“oi  -“ij  -  “<ll«il) 


(C26) 


Substituting  (CL9),  (C20),  (C22)  and  (C26)  in  (Cl2)-(C17)  and  manipulating 
terms  results  in 


'<o“  ■  ^Os’  +  +  “<"*") 

+  A^^^  +  0(11  £ II) 
i  is  if 

A^^^  -  A^^^  +  OCIIsll  ). 


(C27) 


Lslng  these  limiting  values  in  (CIO)  and  (CLl)  and  simplifying  the  terms  gives 
i*  Che  desired  result 
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APPENDIX  D 

MODEL  VARIABLES  OF  THE  POWER  EXAMPLE  OF  CHAPTER  6 

^1*^12  *  position  displacement  in  first  thermal  unit  of  area  1  and  2. 

^2*^13  *  output  displacement  of  HP  turbine  in  first  thermal  unit  of  area 

1  and  2. 

"  power  output  displacement  of  IP  turbine  in  first  thermal  unit  of  area 
1  and  2. 

«  power  output  displacement  of  LP  turbine  in  first  thermal  unit  of  area 
1  and  2. 

^5’^16  *  valve  position  displacement  in  second  thermal  unit  of  area  I  and  2. 

"  power  output  displacement  of  HP  turbine  in  second  thermal  unit  of  area 

o  17 

1  and  2. 

x^,x^g  «■  power  output  displacement  of  IP  turbine  in  second  thermal  unit  of  area 
1  and  2 . 

Xg,Xi9  ■■  power  output  displacement  of  LP  turbine  in  second  thermal  unit  of  area 
1  and  2. 

x^.Xii  •  frequency  deviation  of  area  1  and  2. 

x^Q  «  tie-line  power  flow  connecting  area  1  and  2. 

u^^^u^^^a  set-point  adjustment  of  first  thermal  unit  in  area  1  and  2. 

u^^^u^^^a  set-point  adjustment  of  second  thermal  unit  in  area  1  and  2. 

y^^^y^^^a  frequency  deviation  of  area  1  and  2. 

y^^^yj^^a  tie-line  power  flow  of  area  1  and  2. 


TT 
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